I've been running tests on pisto exclusively the past few days and I'm a bit surprised by how many jobs actually are running at once.
In my case, I have:
- an opam-health-check instance giving exactly 80 jobs to the test pool
- the 80 jobs fill the worker capacity to its maximum
However most of the jobs stop at:
[...]
---> using "486382d4d06e1d3cd81a0566311ede22726942ab1a24bce8a41318c6a5717b58" from cache
/: (run (cache (opam-archives (target /home/opam/.opam/download-cache)))
(network host)
(shell "<cmd>"))
I've only seen a maximum of 23 out of the 79 jobs that actually start the said command, I'm not sure what's happening to the rest (runc isn't even started).
Maybe there is some kind of IO bottleneck partially caused by ocaml/opam#4586 + maybe the opam-archives cache might be too big and btrfs is struggling to pull it (?)
The load average of the machine in this state is around 15% so if there is a bottleneck it must be some kind of IO or syscall bottleneck
I've been running tests on pisto exclusively the past few days and I'm a bit surprised by how many jobs actually are running at once.
In my case, I have:
However most of the jobs stop at:
I've only seen a maximum of 23 out of the 79 jobs that actually start the said command, I'm not sure what's happening to the rest (runc isn't even started).
Maybe there is some kind of IO bottleneck partially caused by ocaml/opam#4586 + maybe the
opam-archivescache might be too big and btrfs is struggling to pull it (?)The load average of the machine in this state is around 15% so if there is a bottleneck it must be some kind of IO or syscall bottleneck