Skip to content

Io report caller interval#60

Merged
vladkens merged 6 commits into
vladkens:mainfrom
homm:io-report-caller-interval
Jun 9, 2026
Merged

Io report caller interval#60
vladkens merged 6 commits into
vladkens:mainfrom
homm:io-report-caller-interval

Conversation

@homm

@homm homm commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

This PR is a port of the timer-related changes from #59.

Sampling and calculation fixes

Sampling now matches the requested interval

Previously, a requested 1 s interval was internally handled as 4 x 250 ms samples plus an extra averaging pass, with the sampling loop implemented in IOReport::get_sample.

In this branch, the same 1 s interval is sampled as an actual 1 s interval instead of being split into smaller internal windows.

That earlier 4 x 250 ms scheme was introduced as a workaround for the behavior described in issue #10. However, the later investigation showed that the original problem had a different cause: the older implementation sampled only a fixed 80 ms window and then slept for the remainder of the interval, while get_metrics used that short window as the sampling duration.

Now that sampling is structured around consecutive real samples and their delta, the reported metrics are derived from the actual elapsed interval itself. Because of that, the extra 4 x 250 ms smoothing loop is no longer necessary. It also allowed the IOReport sampling path to become substantially simpler by removing the internal multi-sample orchestration that used to live in IOReport::get_sample.

Interval management is now fully owned by the caller

Another related change is that sampling no longer sleeps internally.

Previously, IOReport itself managed timing by sleeping between internal samples in IOReport::get_sample. In this branch, that responsibility is moved out of IOReport::get_sample, which now just computes the delta between the previous sample and the current one. It does not block, does not sleep, and does not decide how often sampling should happen.

This is a better separation of responsibilities:

  • IOReport is responsible only for sample-to-sample delta calculation;
  • the caller is responsible for scheduling and interval management;
  • the elapsed time used for metric calculation is the actual measured interval, not a synthetic sub-interval chosen inside the sampling layer.

Reduced interval drift

The measurements and timings were changed since #59 to account for the 100ms minimum interval.

The same change also reduces drift across repeated short samples.

Startup time, which is not changed in this version:

$ time macmon pipe -i 100 -s 1 > /dev/null
0.12s user 0.53s system 22% cpu 2.925 total

Excluding the 100ms interval, it is about 2.925s - 100ms = 2.825s.

With -i 100 -s 100, the requested sampling time is 100 x 100ms = 10s:

$ time macmon pipe -i 100 -s 100 > /dev/null
0.83s user 3.35s system 19% cpu 21.015 total

$ time target/release/macmon pipe -i 100 -s 100 > /dev/null
0.32s user 1.61s system 15% cpu 12.821 total

The total runtime excluding startup dropped from about 21.015s - 2.825s = 18.19s to about 12.821s - 2.825s = 9.996s, which perfectly matches the requested 10s. In the old versions, there was a large 8s drift due to ignoring the IOReportCreateSamples time itself multiplied by 4.

homm and others added 2 commits April 28, 2026 21:16
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
@homm

homm commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@vladkens unlike #59, this is not a draft. It is an isolated change based on the current main branch, with a few improvements over the original #59 implementation.

@vladkens

vladkens commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Thanks for the PR. I agree with the drift issue: the current internal sampling can accumulate extra delay from IOReport calls, and that part is worth fixing.

I want to preserve the public get_metrics(duration) API. If people use macmon as a library, this should stay compatible.

The duration-based sampling and internal smoothing are intentional. I tuned this path so short/cyclic loads produce graphs close to Activity Monitor, so replacing it with a fully caller-owned loop is not a direct swap.

For the manual sampling API, the tricky part is stale windows: if the caller stops polling for a while, “metrics since last call” can become an average over a long gap instead of current metrics. I’ll keep that in mind while reworking this PR so we can support manual sampling without breaking the existing behavior.

@vladkens

vladkens commented Jun 9, 2026

Copy link
Copy Markdown
Owner

I think the manual API should be added, but without replacing the existing get_metrics(duration) API.

My preferred shape is:

sampler.get_metrics(duration_ms) -> Result<Metrics>
sampler.get_metrics_now(stale_after_ms) -> Result<Option<Metrics>>

get_metrics(duration_ms) stays as the canonical blocking/smoothed API. It owns the sampling window and keeps the current 4-sample smoothing behavior.

get_metrics_now(stale_after_ms) is the manual/caller-owned API:

  • it does not sleep;
  • it does not do the 4-sample smoothing;
  • the first call stores a baseline and returns Ok(None);
  • later calls return Ok(Some(metrics)) and move the baseline forward;
  • if the elapsed window is larger than stale_after_ms, the old baseline is stale, so it is replaced and the function returns Ok(None).

This keeps the public API compatible, avoids a separate reset/start API, and prevents a stale-window bug where a caller stops polling for a while and then receives metrics averaged over the whole gap as if they were current metrics.

@vladkens

vladkens commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Final update before merge: I reworked this PR along the lines discussed above.

What I am taking:

  • the IOReport drift fix, so IOReportCreateSamples overhead no longer accumulates on top of the requested interval;
  • a separate manual sampling API: get_metrics_now(stale_after_ms).

What I am not taking:

  • replacing get_metrics(duration) with fully caller-owned scheduling;
  • removing the existing 4-sample smoothing.

get_metrics(duration) stays compatible and remains the default API for continuous polling loops. The new get_metrics_now(stale_after_ms) covers manual scheduling without requiring a separate reset/start API, and now returns a complete Metrics value for valid sample windows.

Thanks again for the PR. I am going to merge this version.

@vladkens vladkens merged commit f773f17 into vladkens:main Jun 9, 2026
@vladkens vladkens added this to the v0.8.0 milestone Jun 9, 2026
@homm

homm commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

The internal smoothing is intentional. I tuned this path so short/cyclic loads produce graphs close to Activity Monitor

Not sure how Activity Monitor is related since it doesn't show such metrics. 4x smoothing was added after my report of the metrics are not accurate, this is correct. But the real cause of this incorrectness was that the older version collects data for only 80 ms window. So the proper fix is not 4x smoothing, the real fix is make the whole interval a metric window which is already happens.

I have described this in the pr description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants