feat: add ExitReason tracking for actor lifecycle#163
feat: add ExitReason tracking for actor lifecycle#163ElFantasma wants to merge 3 commits intomainfrom
Conversation
🤖 Kimi Code ReviewThe PR looks solid and correctly implements exit reason tracking for supervision. Here are my observations: ✅ Correctness & Safety
Panic handling - Both async and thread implementations correctly use Race safety - The
|
🤖 Codex Code ReviewFindings:
Open question:
Verification gap: I could not run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewPR #163 Review:
|
Greptile SummaryThis PR adds Key changes:
Issues found:
Confidence Score: 4/5Safe to merge with the RAII regression in threads mode addressed; all other findings are P2 style/hardening concerns One P1 issue exists: the threads
|
| Filename | Overview |
|---|---|
| concurrency/src/threads/actor.rs | Mutex changed to Option<ExitReason>; CompletionGuard refactored to carry reason, but now constructed after run_actor — breaking the RAII guarantee that prevented wait_exit() deadlocks on unexpected panics |
| concurrency/src/tasks/actor.rs | Watch channel changed from bool to Option<ExitReason>; wait_exit() added; run_actor returns ExitReason; panic_message duplicated from threads module; wait_exit maps dropped sender to Normal even for task aborts |
| concurrency/src/error.rs | Adds ExitReason enum with Normal, Shutdown, Panic(String), Kill variants, is_abnormal(), and Display impl; missing PartialEq derive limits supervision use |
| concurrency/src/lib.rs | Re-exports ExitReason alongside ActorError — straightforward one-line change |
| examples/exit_reason/src/main.rs | New runnable example demonstrating all four exit paths (clean stop, handler panic, started panic, polling); well-structured and accurate |
| examples/exit_reason/Cargo.toml | New example package manifest, correctly declares workspace dependencies |
Sequence Diagram
sequenceDiagram
participant Caller
participant ActorRef
participant run_actor
participant CompletionChannel
Caller->>ActorRef: .start()
ActorRef->>CompletionChannel: init(None)
ActorRef->>run_actor: spawn(actor, ctx, rx, token)
alt started() panics
run_actor-->>run_actor: catch_unwind → ExitReason::Panic
else started() calls ctx.stop()
run_actor-->>run_actor: ExitReason::Normal
else normal flow
loop message loop
run_actor->>run_actor: recv message
alt handler panics
run_actor-->>run_actor: exit_reason = Panic
run_actor-->>run_actor: break
else ctx.stop() called
run_actor-->>run_actor: break (Normal)
end
end
run_actor->>run_actor: stopped() [catch_unwind]
end
run_actor->>CompletionChannel: send(Some(ExitReason))
CompletionChannel-->>ActorRef: notify
Caller->>ActorRef: .wait_exit()
ActorRef->>CompletionChannel: wait for Some(reason)
CompletionChannel-->>ActorRef: Some(ExitReason)
ActorRef-->>Caller: ExitReason
Prompt To Fix All With AI
This is a comment left during a code review.
Path: concurrency/src/threads/actor.rs
Line: 412-420
Comment:
**RAII guarantee broken — `wait_exit()` can deadlock if `run_actor` panics unexpectedly**
The old code constructed `CompletionGuard` *before* calling `run_actor`, so the guard's `Drop` impl would always fire (writing `Some(reason)` and calling `cvar.notify_all()`) even if `run_actor` panicked at the thread level:
```rust
// OLD — always notifies, even on unexpected panic
let _guard = CompletionGuard(completion);
run_actor(actor, ctx, rx, cancellation_token);
```
The new code constructs the guard *after* `run_actor` returns. If `run_actor` ever panics without being caught (e.g., a future bug in the framework, a stack overflow, or a `catch_unwind` boundary issue), the guard is never created, `completion` is just dropped as an `Arc`, and any thread blocked in `wait_exit()` → `cvar.wait(…)` will deadlock forever.
Note that the `tasks` mode is unaffected because dropping `completion_tx` causes `rx.changed()` to return `Err`, so `wait_exit()` returns `ExitReason::Normal` instead of hanging.
A safe fix is to wrap `run_actor` at the thread closure level:
```rust
let _thread_handle = rt::spawn(move || {
let reason = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
run_actor(actor, ctx, rx, cancellation_token)
}))
.unwrap_or(ExitReason::Panic("unexpected framework panic".to_string()));
let guard = CompletionGuard { completion, reason };
drop(guard);
});
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 383-393
Comment:
**`wait_exit()` silently maps task abortion/cancellation to `ExitReason::Normal`**
When `rx.changed().await` returns `Err`, it means every `watch::Sender` was dropped without ever sending `Some(reason)`. If the spawned task is **aborted** from outside (e.g., the tokio `JoinHandle` is used to abort, or the runtime shuts down mid-execution), the sender is dropped without a value and this branch returns `ExitReason::Normal` — even though the actor was forcibly killed.
For a supervision foundation this silent misclassification matters: a supervisor that only restarts on `is_abnormal()` would silently ignore an aborted actor. Consider returning `ExitReason::Kill` here instead:
```suggestion
if rx.changed().await.is_err() {
return ExitReason::Kill;
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 505-510
Comment:
**Panic in `stopped()` is not reflected in the returned `ExitReason`**
When `stopped()` panics, the error is logged but `exit_reason` is unchanged. Two scenarios arise:
1. **Clean stop + `stopped()` panics** → returns `ExitReason::Normal`. A supervisor relying on `is_abnormal()` won't restart the actor despite cleanup code crashing.
2. **Handler panic + `stopped()` panics** → returns the original `Panic("panic in handler: …")`, which is correct but hides the secondary cleanup failure.
The same logic applies to the equivalent lines in `concurrency/src/threads/actor.rs` (lines 479–485).
If panics in `stopped()` should be surfaced, consider:
```rust
if let Err(panic) = stop_result {
let msg = panic_message(&panic);
tracing::error!("Panic in stopped() callback: {msg}");
if !exit_reason.is_abnormal() {
exit_reason = ExitReason::Panic(format!("panic in stopped(): {msg}"));
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: concurrency/src/error.rs
Line: 2-12
Comment:
**`ExitReason` is missing `PartialEq` — limits usability for supervision**
`ExitReason` derives `Debug` and `Clone` but not `PartialEq`. Supervision trees will almost certainly need to compare exit reasons directly. Without `PartialEq`, callers must use `matches!` patterns everywhere and cannot use `==` or `!=`.
```suggestion
#[derive(Debug, Clone, PartialEq)]
pub enum ExitReason {
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 515-523
Comment:
**`panic_message` is duplicated across both actor modules**
This identical helper function is defined in both `concurrency/src/tasks/actor.rs` (lines 515–523) and `concurrency/src/threads/actor.rs` (lines 490–498). Consider moving it to `concurrency/src/error.rs` or a shared internal utilities module so it can be reused without duplication.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat: add ExitReason tracking for actor ..." | Re-trigger Greptile
| let _thread_handle = rt::spawn(move || { | ||
| let _guard = CompletionGuard(completion); | ||
| run_actor(actor, ctx, rx, cancellation_token); | ||
| let reason = run_actor(actor, ctx, rx, cancellation_token); | ||
| let guard = CompletionGuard { | ||
| completion, | ||
| reason, | ||
| }; | ||
| drop(guard); | ||
| }); | ||
|
|
There was a problem hiding this comment.
RAII guarantee broken —
wait_exit() can deadlock if run_actor panics unexpectedly
The old code constructed CompletionGuard before calling run_actor, so the guard's Drop impl would always fire (writing Some(reason) and calling cvar.notify_all()) even if run_actor panicked at the thread level:
// OLD — always notifies, even on unexpected panic
let _guard = CompletionGuard(completion);
run_actor(actor, ctx, rx, cancellation_token);The new code constructs the guard after run_actor returns. If run_actor ever panics without being caught (e.g., a future bug in the framework, a stack overflow, or a catch_unwind boundary issue), the guard is never created, completion is just dropped as an Arc, and any thread blocked in wait_exit() → cvar.wait(…) will deadlock forever.
Note that the tasks mode is unaffected because dropping completion_tx causes rx.changed() to return Err, so wait_exit() returns ExitReason::Normal instead of hanging.
A safe fix is to wrap run_actor at the thread closure level:
let _thread_handle = rt::spawn(move || {
let reason = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
run_actor(actor, ctx, rx, cancellation_token)
}))
.unwrap_or(ExitReason::Panic("unexpected framework panic".to_string()));
let guard = CompletionGuard { completion, reason };
drop(guard);
});Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/threads/actor.rs
Line: 412-420
Comment:
**RAII guarantee broken — `wait_exit()` can deadlock if `run_actor` panics unexpectedly**
The old code constructed `CompletionGuard` *before* calling `run_actor`, so the guard's `Drop` impl would always fire (writing `Some(reason)` and calling `cvar.notify_all()`) even if `run_actor` panicked at the thread level:
```rust
// OLD — always notifies, even on unexpected panic
let _guard = CompletionGuard(completion);
run_actor(actor, ctx, rx, cancellation_token);
```
The new code constructs the guard *after* `run_actor` returns. If `run_actor` ever panics without being caught (e.g., a future bug in the framework, a stack overflow, or a `catch_unwind` boundary issue), the guard is never created, `completion` is just dropped as an `Arc`, and any thread blocked in `wait_exit()` → `cvar.wait(…)` will deadlock forever.
Note that the `tasks` mode is unaffected because dropping `completion_tx` causes `rx.changed()` to return `Err`, so `wait_exit()` returns `ExitReason::Normal` instead of hanging.
A safe fix is to wrap `run_actor` at the thread closure level:
```rust
let _thread_handle = rt::spawn(move || {
let reason = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
run_actor(actor, ctx, rx, cancellation_token)
}))
.unwrap_or(ExitReason::Panic("unexpected framework panic".to_string()));
let guard = CompletionGuard { completion, reason };
drop(guard);
});
```
How can I resolve this? If you propose a fix, please make it concise.| pub async fn wait_exit(&self) -> ExitReason { | ||
| let mut rx = self.completion_rx.clone(); | ||
| while !*rx.borrow_and_update() { | ||
| loop { | ||
| if let Some(reason) = rx.borrow_and_update().clone() { | ||
| return reason; | ||
| } | ||
| if rx.changed().await.is_err() { | ||
| break; | ||
| return ExitReason::Normal; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
wait_exit() silently maps task abortion/cancellation to ExitReason::Normal
When rx.changed().await returns Err, it means every watch::Sender was dropped without ever sending Some(reason). If the spawned task is aborted from outside (e.g., the tokio JoinHandle is used to abort, or the runtime shuts down mid-execution), the sender is dropped without a value and this branch returns ExitReason::Normal — even though the actor was forcibly killed.
For a supervision foundation this silent misclassification matters: a supervisor that only restarts on is_abnormal() would silently ignore an aborted actor. Consider returning ExitReason::Kill here instead:
| pub async fn wait_exit(&self) -> ExitReason { | |
| let mut rx = self.completion_rx.clone(); | |
| while !*rx.borrow_and_update() { | |
| loop { | |
| if let Some(reason) = rx.borrow_and_update().clone() { | |
| return reason; | |
| } | |
| if rx.changed().await.is_err() { | |
| break; | |
| return ExitReason::Normal; | |
| } | |
| } | |
| } | |
| if rx.changed().await.is_err() { | |
| return ExitReason::Kill; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 383-393
Comment:
**`wait_exit()` silently maps task abortion/cancellation to `ExitReason::Normal`**
When `rx.changed().await` returns `Err`, it means every `watch::Sender` was dropped without ever sending `Some(reason)`. If the spawned task is **aborted** from outside (e.g., the tokio `JoinHandle` is used to abort, or the runtime shuts down mid-execution), the sender is dropped without a value and this branch returns `ExitReason::Normal` — even though the actor was forcibly killed.
For a supervision foundation this silent misclassification matters: a supervisor that only restarts on `is_abnormal()` would silently ignore an aborted actor. Consider returning `ExitReason::Kill` here instead:
```suggestion
if rx.changed().await.is_err() {
return ExitReason::Kill;
}
```
How can I resolve this? If you propose a fix, please make it concise.| cancellation_token.cancel(); | ||
| let stop_result = AssertUnwindSafe(actor.stopped(&ctx)).catch_unwind().await; | ||
| if let Err(panic) = stop_result { | ||
| tracing::error!("Panic in stopped() callback: {panic:?}"); | ||
| let msg = panic_message(&panic); | ||
| tracing::error!("Panic in stopped() callback: {msg}"); | ||
| } |
There was a problem hiding this comment.
Panic in
stopped() is not reflected in the returned ExitReason
When stopped() panics, the error is logged but exit_reason is unchanged. Two scenarios arise:
- Clean stop +
stopped()panics → returnsExitReason::Normal. A supervisor relying onis_abnormal()won't restart the actor despite cleanup code crashing. - Handler panic +
stopped()panics → returns the originalPanic("panic in handler: …"), which is correct but hides the secondary cleanup failure.
The same logic applies to the equivalent lines in concurrency/src/threads/actor.rs (lines 479–485).
If panics in stopped() should be surfaced, consider:
if let Err(panic) = stop_result {
let msg = panic_message(&panic);
tracing::error!("Panic in stopped() callback: {msg}");
if !exit_reason.is_abnormal() {
exit_reason = ExitReason::Panic(format!("panic in stopped(): {msg}"));
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 505-510
Comment:
**Panic in `stopped()` is not reflected in the returned `ExitReason`**
When `stopped()` panics, the error is logged but `exit_reason` is unchanged. Two scenarios arise:
1. **Clean stop + `stopped()` panics** → returns `ExitReason::Normal`. A supervisor relying on `is_abnormal()` won't restart the actor despite cleanup code crashing.
2. **Handler panic + `stopped()` panics** → returns the original `Panic("panic in handler: …")`, which is correct but hides the secondary cleanup failure.
The same logic applies to the equivalent lines in `concurrency/src/threads/actor.rs` (lines 479–485).
If panics in `stopped()` should be surfaced, consider:
```rust
if let Err(panic) = stop_result {
let msg = panic_message(&panic);
tracing::error!("Panic in stopped() callback: {msg}");
if !exit_reason.is_abnormal() {
exit_reason = ExitReason::Panic(format!("panic in stopped(): {msg}"));
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
concurrency/src/error.rs
Outdated
| #[derive(Debug, Clone)] | ||
| pub enum ExitReason { | ||
| /// Clean stop via `ctx.stop()` or channel closure. | ||
| Normal, | ||
| /// Ordered shutdown from a supervisor or linked actor. | ||
| Shutdown, | ||
| /// Actor panicked in `started()`, a handler, or `stopped()`. | ||
| Panic(String), | ||
| /// Untrappable kill signal. | ||
| Kill, | ||
| } |
There was a problem hiding this comment.
ExitReason is missing PartialEq — limits usability for supervision
ExitReason derives Debug and Clone but not PartialEq. Supervision trees will almost certainly need to compare exit reasons directly. Without PartialEq, callers must use matches! patterns everywhere and cannot use == or !=.
| #[derive(Debug, Clone)] | |
| pub enum ExitReason { | |
| /// Clean stop via `ctx.stop()` or channel closure. | |
| Normal, | |
| /// Ordered shutdown from a supervisor or linked actor. | |
| Shutdown, | |
| /// Actor panicked in `started()`, a handler, or `stopped()`. | |
| Panic(String), | |
| /// Untrappable kill signal. | |
| Kill, | |
| } | |
| #[derive(Debug, Clone, PartialEq)] | |
| pub enum ExitReason { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/error.rs
Line: 2-12
Comment:
**`ExitReason` is missing `PartialEq` — limits usability for supervision**
`ExitReason` derives `Debug` and `Clone` but not `PartialEq`. Supervision trees will almost certainly need to compare exit reasons directly. Without `PartialEq`, callers must use `matches!` patterns everywhere and cannot use `==` or `!=`.
```suggestion
#[derive(Debug, Clone, PartialEq)]
pub enum ExitReason {
```
How can I resolve this? If you propose a fix, please make it concise.
concurrency/src/tasks/actor.rs
Outdated
| fn panic_message(panic: &Box<dyn std::any::Any + Send>) -> String { | ||
| if let Some(s) = panic.downcast_ref::<&str>() { | ||
| s.to_string() | ||
| } else if let Some(s) = panic.downcast_ref::<String>() { | ||
| s.clone() | ||
| } else { | ||
| format!("{panic:?}") | ||
| } | ||
| } |
There was a problem hiding this comment.
panic_message is duplicated across both actor modules
This identical helper function is defined in both concurrency/src/tasks/actor.rs (lines 515–523) and concurrency/src/threads/actor.rs (lines 490–498). Consider moving it to concurrency/src/error.rs or a shared internal utilities module so it can be reused without duplication.
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 515-523
Comment:
**`panic_message` is duplicated across both actor modules**
This identical helper function is defined in both `concurrency/src/tasks/actor.rs` (lines 515–523) and `concurrency/src/threads/actor.rs` (lines 490–498). Consider moving it to `concurrency/src/error.rs` or a shared internal utilities module so it can be reused without duplication.
How can I resolve this? If you propose a fix, please make it concise.
Bot Review ResponsesAll bot findings have been addressed in 4ec9859 and f37e99b. Summary below. Kimi
Fixed. Changed to
Fixed. Restored RAII pattern with
Codex
Fixed. If
Fixed. Guard is now constructed BEFORE
Fixed. Returns
Added doc comments: "Not yet produced — reserved for supervision tree implementation." Claude
Fixed. Same as Codex finding above.
Fixed. Added
Fixed. Moved to
Fixed. Changed to
Fixed. See Kimi response above.
Fixed. Now uses
Disregarded. The GreptileAll findings overlap with Codex/Claude above — all addressed. The sequence diagram and summary are accurate. |
Summary
ExitReasonenum (Normal,Shutdown,Panic(String),Kill) withis_abnormal()methodActorRef::wait_exit()returns the exit reason (async in tasks, blocking in threads)ActorRef::exit_reason()polls the reason without blockingjoin()unchanged for backwards compatibilityexit_reasonexample demonstrating all exit pathsFoundation for supervision trees (#131, #133). The watch channel changes from
booltoOption<ExitReason>in tasks mode, andMutex<bool>toMutex<Option<ExitReason>>in threads mode.run_actor()now returnsExitReasonon every exit path: clean stop, channel closure, panic in started(), and panic in handler.Test plan
cargo run -p exit_reason