Add realtime_trace_jsonl recipe for structured real-time optimization progress streaming#1177
Add realtime_trace_jsonl recipe for structured real-time optimization progress streaming#1177MySweetEden wants to merge 20 commits intoscipopt:masterfrom
Conversation
… handling; rename optimize_with_trace to optimizeTrace for clarity
…nified event writing method, improving clarity and consistency in event handling.
…ction, enhancing test coverage for both optimizeTrace and optimizeNogilTrace. Update assertions for trace data consistency.
…tracking This update introduces a comprehensive docstring for the _TraceRun class, detailing its purpose, arguments, return values, and usage examples. This enhancement improves code documentation and usability for future developers.
…racking with JSONL output This commit introduces the realtime_trace_jsonl recipe, which allows for real-time tracking of optimization progress and outputs the data in JSONL format. Additionally, the CHANGELOG has been updated to reflect this new feature.
…uments for clarity
There was a problem hiding this comment.
Pull request overview
Adds a new PySCIPOpt recipe to stream structured optimization progress in real time using JSONL, enabling external tailing/monitoring while the solver runs.
Changes:
- Introduces
realtime_trace_jsonlrecipe withoptimizeTrace()/optimizeNogilTrace()to record selected SCIP events intomodel.data["trace"]and optionally a JSONL file. - Records
bestsol_found,dualbound_improved, and a finalrun_endevent with flushing intended for real-time consumption. - Adds tests covering in-memory tracing, file output, and interrupt handling; updates changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| src/pyscipopt/recipes/realtime_trace_jsonl.py | Implements the real-time JSONL tracing recipe and event handling. |
| tests/test_recipe_realtime_trace_jsonl.py | Adds tests for in-memory traces, JSONL file output, and interruption behavior. |
| CHANGELOG.md | Documents the addition of the new recipe. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self._handler = _TraceEventhdlr() | ||
| self.model.includeEventhdlr( | ||
| self._handler, "realtime_trace_jsonl", "Realtime trace jsonl handler" | ||
| ) |
There was a problem hiding this comment.
includeEventhdlr() registers an event handler plugin permanently (there is no corresponding remove/uninclude API). Calling optimizeTrace()/optimizeNogilTrace() multiple times on the same model will attempt to include another handler with the same name (realtime_trace_jsonl), which can raise a SCIP error and/or leave multiple live handlers capturing closed file handles and old _TraceRun instances. Refactor to include the handler at most once per model (e.g., stash/reuse it in model.data), and make the handler read its current sink (trace list / file handle) from mutable attributes rather than a closure over a per-run object.
There was a problem hiding this comment.
Not Addressed
1. includeEventhdlr() multiple invocation issue
The concern about permanent handler registration is valid:
includeEventhdlr()registers handlers permanently with no removal APIdropEvent()only unsubscribes from events, not the handler itself
Scope: Refactoring to a handler-reuse pattern would require architectural changes and will be addressed separately. The current implementation assumes single-run usage.
| self._write_event( | ||
| "dualbound_improved", fields=snapshot, flush=False | ||
| ) |
There was a problem hiding this comment.
For a recipe marketed as “real-time JSONL streaming”, not flushing dualbound_improved events can delay visibility for external consumers tailing the file. Consider flushing here as well (or making flushing policy configurable), especially since dualbound_improved is one of the primary progress signals you record.
There was a problem hiding this comment.
Not Addressed
2. dualbound_improved flush policy
dualbound_improved events are intentionally not flushed:
- Frequency asymmetry:
dualbound_improvedfires hundreds to thousands of times during optimization, whilebestsol_foundfires only a few dozen times at most; flushing on every dual bound update would accumulate significant I/O overhead - OS buffering suffices: Events naturally flush within seconds via OS buffering, providing adequate real-time visibility
- Context: Optimizations typically run for minutes to hours, making second-scale buffering delays negligible
Discussion: I'm open to reconsidering the flush policy if there are use cases where immediate flushing of dualbound_improved events is valuable (e.g., sub-minute monitoring). Would making it configurable be useful, or is the current approach acceptable?
|
I’ll address the comments over the weekend and push updates soon. |
…ntainability. Introduced a set to track caught events, ensuring proper cleanup during event execution. Updated event initialization and execution methods for consistency.
… cleanup process. Added note regarding flushing behavior for dualbound_improved events.
Addressed1.
|
|
I addressed the actionable review items and all checks are green. A couple of higher-level/trade-off points are intentionally left open for discussion. Could you take another look when you have time? |
|
Hey @MySweetEden , yes I will have a look! I will try to lay low for a little bit, for my own sake, but this should get merged, don't worry :) |
Motivation
PySCIPOpt already has recipe(s) that store optimization progress in memory. However, in-memory traces are not suitable for real-time, external observation (e.g., another process tailing progress, dashboards, log collectors).
This recipe focuses on the missing piece: a stream-friendly, structured output that can be consumed outside the running Python process during solve.
Design Decisions
setTracefile()(PR Add setTracefile() method for structured optimization progress loggingAdd settracefile api #1158):type,time,primalbound,dualbound,gap,nodes,nsol) for consistency across tracing approaches.model.data["trace"]for convenience/testing, but the recipe is centered on file streaming viapath=...run_endrecord on normal termination, interruption, or exceptionrun_endincludes structured error metadata (status, exception type, message)run_endto make completion detection reliableEvents Recorded
bestsol_found: when a new best solution is founddualbound_improved: when the dual bound improvesrun_end: when optimization terminates (also emitted on interrupt/exception)Fields
type, time, primalbound, dualbound, gap, nodes, nsol(aligned with the JSONL trace schema introduced in PR #1158)(
run_endmay additionally include:status, exception, messageon failure)