Thanks for all the work you've done to use sacct where ever you can, it's been great at reducing load on slurm.
We noticed that when running reframe at our site, we see evidence that slurm is getting polled at a high enough rate that it's getting auto limited. Looking at the code, it looks like _cancel_if_blocked (which calls squeue) is getting called on a job by job basis. Since the full list of jobs is known when this call is done (it's iterating over the list of jobs in the poll function earlier in the same file), it would be less upsetting to slurm if it queried squeue for all the jobs at once in a single call.
Thanks for all the work you've done to use sacct where ever you can, it's been great at reducing load on slurm.
We noticed that when running reframe at our site, we see evidence that slurm is getting polled at a high enough rate that it's getting auto limited. Looking at the code, it looks like
_cancel_if_blocked(which calls squeue) is getting called on a job by job basis. Since the full list of jobs is known when this call is done (it's iterating over the list of jobs in thepollfunction earlier in the same file), it would be less upsetting to slurm if it queriedsqueuefor all the jobs at once in a single call.