Conversation
|
Hmm there's a non-zero risk that this could break something that depends on the current timing, even if flawed. Are we certain this is not a breaking change that needs to be flagged? |
That would depend on how this does or does not interact with the internal spectre mitigations Though either way, relying on intricate timing should be assumed to be unreliable. |
|
I'm likely worrying for nothing and can't cite a specific example. I've just seen enough cases over the years where changing the timing of events == changing the order of events == subtle breaks. It's probably ok and the risk is likely near zero... it just isn't zero. I want to make sure those are kept in mind. |
|
Ok, let's just wait for @kentonv 's review to make sure we're not missing anything |
kentonv
left a comment
There was a problem hiding this comment.
The code in IoContext::now() says:
KJ_IF_SOME(maybeNextTimeout, timeoutManager->getNextTimeout()) {
// Don't return a time beyond when the next setTimeout() callback is intended to run. This
// ensures that Date.now() inside the callback itself always returns exactly the time at which
// the callback was scheduled (hiding non-determinism which could contain side channels), and
// that the time returned by Date.now() never goes backwards.
return kj::min(adjustedTime, maybeNextTimeout);
It seems like this isn't working as intended? Date.now() inside a timeout event doesn't seem to be returning exactly the time that the timeout was scheduled for, as the comment suggests it should.
If we feel that Date.now() should return an exact time during a timeout event, we should fix it so it does. If we feel that this is not what we want, we should remove the code. But I don't think the right answer is to add more code elsewhere which works around the logic.
It may be that we want the exact scheduling behavior in production (where other spectre timer mitigations exist) but not in workerd (which uses exact timers). I need to think about this more but have to run to a meeting now.
|
... and that's why I like to have @kentonv review these timing related ones ;-) ... the code change itself is ok but I had a strong suspicion this wasn't doing the right thing overall. I think it might be worth capturing a bit more detail of how all the timing stuff should work / is supposed to work in a markdown doc to make this stuff easier to analyze so we're not always having to ping Kenton on these ;-) |
|
Since Date.now() in workerd just reports real time, it makes sense that setTimeout() shouldn't do any shenanigans with scheduling. The "clamp to next timeout time" doesn't make sense. In prod, the spectre mitigation is supposed to make it appear that timers do not advance at all while in tight CPU loops, and then makes up the difference by advancing more quickly when asleep. In prod it ought to be the case that if you measure Date.now() before and after setTimeout(), the difference should be exactly the length of the timeout -- because time stops when the CPU is running. I think we just need to push this clamping behavior into |
f9c688a to
cfe283c
Compare
|
This requires a change in downstream repository as well. I've made some changes upon your feedback @kentonv. |
Fixes #6037
When a timeout fires, its timeoutTimes entry isn’t removed until the promise cleanup runs. During the callback, context.now() clamps to that stale past time, so a new setTimeout computed from context.now() can be scheduled in the past and fire immediately.
This change computes the target time from the raw timer channel clock to avoid stale clamping while scheduling. Note: this bypasses context.now()’s clamping (and precise_timers rounding), which may change timing side‑channel behavior.