Skip to content

Add realtime_trace_jsonl recipe for structured real-time optimization progress streaming#1177

Open
MySweetEden wants to merge 20 commits intoscipopt:masterfrom
MySweetEden:realtime-trace-jsonl
Open

Add realtime_trace_jsonl recipe for structured real-time optimization progress streaming#1177
MySweetEden wants to merge 20 commits intoscipopt:masterfrom
MySweetEden:realtime-trace-jsonl

Conversation

@MySweetEden
Copy link
Contributor

Motivation

PySCIPOpt already has recipe(s) that store optimization progress in memory. However, in-memory traces are not suitable for real-time, external observation (e.g., another process tailing progress, dashboards, log collectors).

This recipe focuses on the missing piece: a stream-friendly, structured output that can be consumed outside the running Python process during solve.

Design Decisions

  • JSONL format: Designed for streaming writes and partial reads; remains readable even if the run is interrupted or crashes
  • Real-time external output is the primary value:
    • Records progress updates as one JSON object per line
    • Flushes on key events so downstream consumers can react immediately
  • Schema compatibility with setTracefile() (PR Add setTracefile() method for structured optimization progress loggingAdd settracefile api #1158):
    • Uses the same JSONL field names (type, time, primalbound, dualbound, gap, nodes, nsol) for consistency across tracing approaches.
  • In-memory + file output: Keeps model.data["trace"] for convenience/testing, but the recipe is centered on file streaming via path=...
  • Robust termination signaling for external monitoring:
    • Always emits a final run_end record on normal termination, interruption, or exception
    • On exceptions, run_end includes structured error metadata (status, exception type, message)
    • Flushes run_end to make completion detection reliable

Events Recorded

  • bestsol_found: when a new best solution is found
  • dualbound_improved: when the dual bound improves
  • run_end: when optimization terminates (also emitted on interrupt/exception)

Fields

type, time, primalbound, dualbound, gap, nodes, nsol (aligned with the JSONL trace schema introduced in PR #1158)
(run_end may additionally include: status, exception, message on failure)

MySweetEden and others added 13 commits January 20, 2026 03:57
… handling; rename optimize_with_trace to optimizeTrace for clarity
…nified event writing method, improving clarity and consistency in event handling.
…ction, enhancing test coverage for both optimizeTrace and optimizeNogilTrace. Update assertions for trace data consistency.
…tracking

This update introduces a comprehensive docstring for the _TraceRun class, detailing its purpose, arguments, return values, and usage examples. This enhancement improves code documentation and usability for future developers.
…racking with JSONL output

This commit introduces the realtime_trace_jsonl recipe, which allows for real-time tracking of optimization progress and outputs the data in JSONL format. Additionally, the CHANGELOG has been updated to reflect this new feature.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new PySCIPOpt recipe to stream structured optimization progress in real time using JSONL, enabling external tailing/monitoring while the solver runs.

Changes:

  • Introduces realtime_trace_jsonl recipe with optimizeTrace() / optimizeNogilTrace() to record selected SCIP events into model.data["trace"] and optionally a JSONL file.
  • Records bestsol_found, dualbound_improved, and a final run_end event with flushing intended for real-time consumption.
  • Adds tests covering in-memory tracing, file output, and interrupt handling; updates changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/pyscipopt/recipes/realtime_trace_jsonl.py Implements the real-time JSONL tracing recipe and event handling.
tests/test_recipe_realtime_trace_jsonl.py Adds tests for in-memory traces, JSONL file output, and interruption behavior.
CHANGELOG.md Documents the addition of the new recipe.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +66
self._handler = _TraceEventhdlr()
self.model.includeEventhdlr(
self._handler, "realtime_trace_jsonl", "Realtime trace jsonl handler"
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

includeEventhdlr() registers an event handler plugin permanently (there is no corresponding remove/uninclude API). Calling optimizeTrace()/optimizeNogilTrace() multiple times on the same model will attempt to include another handler with the same name (realtime_trace_jsonl), which can raise a SCIP error and/or leave multiple live handlers capturing closed file handles and old _TraceRun instances. Refactor to include the handler at most once per model (e.g., stash/reuse it in model.data), and make the handler read its current sink (trace list / file handle) from mutable attributes rather than a closure over a per-run object.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@MySweetEden MySweetEden Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not Addressed

1. includeEventhdlr() multiple invocation issue

The concern about permanent handler registration is valid:

  • includeEventhdlr() registers handlers permanently with no removal API
  • dropEvent() only unsubscribes from events, not the handler itself

Scope: Refactoring to a handler-reuse pattern would require architectural changes and will be addressed separately. The current implementation assumes single-run usage.

Comment on lines +59 to +61
self._write_event(
"dualbound_improved", fields=snapshot, flush=False
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a recipe marketed as “real-time JSONL streaming”, not flushing dualbound_improved events can delay visibility for external consumers tailing the file. Consider flushing here as well (or making flushing policy configurable), especially since dualbound_improved is one of the primary progress signals you record.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not Addressed

2. dualbound_improved flush policy

dualbound_improved events are intentionally not flushed:

  • Frequency asymmetry: dualbound_improved fires hundreds to thousands of times during optimization, while bestsol_found fires only a few dozen times at most; flushing on every dual bound update would accumulate significant I/O overhead
  • OS buffering suffices: Events naturally flush within seconds via OS buffering, providing adequate real-time visibility
  • Context: Optimizations typically run for minutes to hours, making second-scale buffering delays negligible

Discussion: I'm open to reconsidering the flush policy if there are use cases where immediate flushing of dualbound_improved events is valuable (e.g., sub-minute monitoring). Would making it configurable be useful, or is the current approach acceptable?

@MySweetEden
Copy link
Contributor Author

I’ll address the comments over the weekend and push updates soon.

…ntainability. Introduced a set to track caught events, ensuring proper cleanup during event execution. Updated event initialization and execution methods for consistency.
… cleanup process. Added note regarding flushing behavior for dualbound_improved events.
@MySweetEden
Copy link
Contributor Author

Addressed

1. dropEvent() refcount underflow prevention

SCIP does not provide a dedicated "solve finished" event, so we cannot rely on an event-driven shutdown callback (like eventexit()) for cleanup. Therefore, cleanup is performed in __exit__, with guards to avoid invalid dropEvent() calls.

Implementation:

  • Added self._caught_events (set) to track which events were successfully caught in eventinit()
  • Modified __exit__ to drop only the events that were actually caught
  • This prevents incorrect dropEvent() calls if eventinit() partially fails

2. Trace data initialization

Changed from setdefault() to explicit assignment:

self.model.data["trace"] = []

Rationale: Ensures each traced run starts with a fresh list, preventing data from previous runs from mixing and keeping in-memory behavior consistent with file output (which always truncates).

3. Event handler parameter naming

Changed the first parameter from s to hdlr to improve readability while avoiding collision with the outer _TraceRun.self.
Note: hdlr is a conventional abbreviation commonly used in PySCIPOPT/SCIP codebase.

4. Exception handling and flush comments

Added explanatory comments to exception handling and flush behavior for clarity.


Not Addressed

1. includeEventhdlr() multiple invocation issue

The concern about permanent handler registration is valid:

  • includeEventhdlr() registers handlers permanently with no removal API
  • dropEvent() only unsubscribes from events, not the handler itself

Scope: Refactoring to a handler-reuse pattern would require architectural changes and will be addressed separately. The current implementation assumes single-run usage.

2. dualbound_improved flush policy

dualbound_improved events are intentionally not flushed:

  • Frequency asymmetry: dualbound_improved fires hundreds to thousands of times during optimization, while bestsol_found fires only a few dozen times at most; flushing on every dual bound update would accumulate significant I/O overhead
  • OS buffering suffices: Events naturally flush within seconds via OS buffering, providing adequate real-time visibility
  • Context: Optimizations typically run for minutes to hours, making second-scale buffering delays negligible

Discussion: I'm open to reconsidering the flush policy if there are use cases where immediate flushing of dualbound_improved events is valuable (e.g., sub-minute monitoring). Would making it configurable be useful, or is the current approach acceptable?

@MySweetEden
Copy link
Contributor Author

I addressed the actionable review items and all checks are green. A couple of higher-level/trade-off points are intentionally left open for discussion. Could you take another look when you have time?

@Joao-Dionisio
Copy link
Member

Hey @MySweetEden , yes I will have a look! I will try to lay low for a little bit, for my own sake, but this should get merged, don't worry :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants