feat: add topology-aware adaptive controls#282
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds three internal topology-aware control primitives ( ChangesTopology-Aware Adaptive Controls
Pre-commit ty Hook Fix
Sequence Diagram(s)sequenceDiagram
participant Runtime as AdaptiveRuntime
participant Feature as TelemetryFeature
participant ACG as AcgLearner
participant Detector as ConvergenceDetector
participant Backend as AdaptiveBackend
Runtime->>Feature: new(tool_parallelism, acg, convergence)
Feature->>ACG: new_with_convergence(agent_id, window, convergence_cfg)
loop Each process_run
ACG->>Backend: load_stability(profile_key)
alt cached converged=true
ACG->>ACG: reuse cached stability counts
else converged=false
ACG->>ACG: build rolling observation window
ACG->>ACG: analyze_stability()
ACG->>Detector: record_epoch(betti, drift, error)
Detector-->>ACG: is_converged() && epoch >= stability_window
alt converged confirmed
ACG->>Backend: store_stability(converged=true)
else not yet converged
ACG->>Backend: store_stability(converged=false)
end
end
end
sequenceDiagram
participant Intercept as AdaptiveHintsIntercept
participant Governor as HintGovernor
participant GGov as GeometricGovernor
participant Request as LLM Request
Intercept->>Intercept: build hints from HotCache
Intercept->>Intercept: should_inject_hints(hints, manual_ls)
alt manual latency override set
Intercept->>Request: inject AgentHints (forced)
else no manual override
alt HintGovernor configured
Intercept->>Governor: allow(latency_sensitivity)
Governor->>GGov: should_trigger(deviation)
GGov-->>Governor: bool
Governor->>GGov: adapt(observed_rate, dt)
alt allowed
Intercept->>Request: inject AgentHints
else shed
Intercept->>Request: skip injection
end
else no governor
Intercept->>Request: inject AgentHints
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/adaptive/tests/unit/storage_tests.rs (1)
82-94: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAssert
convergedin stability round-trip tests.The fixture now includes
converged, but the round-trip test only checksstable_prefix_lengthandtotal_observations. Add an assertion forloaded_stability.convergedto lock this storage contract, since runtime behavior reads this field.Proposed test assertion
assert_eq!(loaded_stability.stable_prefix_length, 1); assert_eq!(loaded_stability.total_observations, 3); + assert!(!loaded_stability.converged);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/storage_tests.rs` around lines 82 - 94, In the round-trip test that uses the sample_stability fixture, add an assertion to verify that loaded_stability.converged equals the value set in the sample_stability function (which is false). This assertion should be added alongside the existing assertions for stable_prefix_length and total_observations to ensure the converged field is properly persisted and loaded during the serialization round-trip.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/adaptive-topology/src/convergence.rs`:
- Around line 150-156: The stability_window parameter in the new method is only
being clamped to a minimum value using max(MIN_STABILITY_WINDOW), but is not
clamped to a maximum value. Since the RingBuffer instances (betti_history,
drift_history, error_history) have a fixed capacity of MAX_HISTORY (32), if
stability_window exceeds this capacity, methods like is_betti_stable and
is_drift_decreasing will never satisfy their length checks, making
topology-based convergence unreachable. Apply both minimum and maximum clamping
to stability_window by using a method that clamps it between
MIN_STABILITY_WINDOW and MAX_HISTORY.
- Line 166: The `epoch` field is typed as `u32` and the increment operation on
line 166 (self.epoch += 1) can cause integer overflow and wraparound in
long-running detectors, breaking epoch-based gating and diagnostics. Change the
type of the `epoch` field from `u32` to `u64` throughout the convergence module
to prevent wraparound and ensure reliable epoch tracking for the lifetime of the
detector.
In `@crates/adaptive-topology/src/geometry.rs`:
- Around line 252-257: The pruning_ratio method is counting true values across
the entire active_mask array, but only the first self.counts[0] entries
represent actual populated level-0 blocks. When the mask contains true entries
beyond self.counts[0], this causes the calculation to produce invalid ratios.
Restrict the active count calculation to only the first self.counts[0] elements
of active_mask by using slice notation to iterate over
active_mask[..self.counts[0]] instead of the full active_mask.
In `@crates/adaptive-topology/src/manifold.rs`:
- Around line 91-100: The assertion in the `new` method uses `D * tau` as the
validation threshold, but the actual embedding definition only requires `(D - 1)
* tau + 1` samples. Update the assertion logic to use the correct formula `(D -
1) * tau + 1` instead of `D * tau`, and also apply the same correction to the
embed readiness threshold check around lines 121-123 to ensure consistency and
avoid unnecessarily delaying valid embeddings.
- Around line 239-246: The issue in the compute_betti_0 function is that nodes
are not marked as visited when pushed onto the stack, only when popped, causing
the same neighbor to be pushed multiple times. Under dense connectivity, this
wastes the fixed-size stack space and can cause the stack to overflow, skipping
reachable nodes and producing incorrect Betti number calculations. Fix this by
marking a neighbor as visited immediately when it is pushed onto the stack in
the section where stack[stack_top] = neighbor is executed, rather than deferring
the visited marking until pop-time, to prevent duplicate pushes of the same
neighbor.
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 185-208: The issue is that store_stability is being called before
store_observations, which means if observation storage fails after stability is
marked as converged, the next run will skip observation repair permanently. To
fix this, locate all places where store_stability and store_observations are
called together (including the instances at lines 223-229 and 235-237 mentioned
in the comment), and reorder these calls so that store_observations is always
called before store_stability. This ensures that if observation storage fails,
the profile won't yet be marked as converged and can be retried on the next run.
In `@python/nemo_relay/_native.pyi`:
- Around line 1235-1236: The DriftDetector.update method stub currently accepts
Sequence[float] for the centroid parameter, but the native binding requires a
fixed 3-element array. Change the centroid parameter type from Sequence[float]
to a fixed-size type representation (such as a tuple of exactly three floats) to
match the native binding's expectations. Additionally, update the docstring to
explicitly clarify that the centroid must be a 3-dimensional coordinate with
exactly three float values.
---
Outside diff comments:
In `@crates/adaptive/tests/unit/storage_tests.rs`:
- Around line 82-94: In the round-trip test that uses the sample_stability
fixture, add an assertion to verify that loaded_stability.converged equals the
value set in the sample_stability function (which is false). This assertion
should be added alongside the existing assertions for stable_prefix_length and
total_observations to ensure the converged field is properly persisted and
loaded during the serialization round-trip.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 8b2fd9ec-2173-410e-a7fa-73fc991c831b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (66)
.pre-commit-config.yamlATTRIBUTIONS-Rust.mdCargo.tomlcrates/adaptive-topology/Cargo.tomlcrates/adaptive-topology/README.mdcrates/adaptive-topology/src/convergence.rscrates/adaptive-topology/src/drift.rscrates/adaptive-topology/src/geometry.rscrates/adaptive-topology/src/governor.rscrates/adaptive-topology/src/lib.rscrates/adaptive-topology/src/manifold.rscrates/adaptive-topology/src/topology.rscrates/adaptive/Cargo.tomlcrates/adaptive/README.mdcrates/adaptive/benches/convergence_bench.rscrates/adaptive/src/acg/stability.rscrates/adaptive/src/acg_learner.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/config.rscrates/adaptive/src/lib.rscrates/adaptive/src/plugin_component.rscrates/adaptive/src/runtime/features.rscrates/adaptive/src/runtime/validation.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/tests/integration/runtime_integration_tests.rscrates/adaptive/tests/integration/tool_parallelism_plan_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/acg/economics_internal_tests.rscrates/adaptive/tests/unit/acg/economics_policy_tests.rscrates/adaptive/tests/unit/acg/multi_breakpoint_tests.rscrates/adaptive/tests/unit/acg_component_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/unit/cache_diagnostics_tests.rscrates/adaptive/tests/unit/config_tests.rscrates/adaptive/tests/unit/intercepts_tests.rscrates/adaptive/tests/unit/plugin_component_tests.rscrates/adaptive/tests/unit/runtime_features_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/storage_memory_internal_tests.rscrates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/tool_parallelism_learner_tests.rscrates/adaptive/tests/unit/types_tests.rscrates/node/adaptive.d.tscrates/node/adaptive.jscrates/node/tests/adaptive_tests.mjscrates/python/Cargo.tomlcrates/python/src/lib.rscrates/python/src/py_adaptive_topology.rscrates/python/tests/coverage/py_storage_coverage_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/adaptive-hints.mdxdocs/adaptive-plugin/configuration.mdxgo/nemo_relay/adaptive.gogo/nemo_relay/adaptive/adaptive.gogo/nemo_relay/adaptive_test.gopython/nemo_relay/__init__.pypython/nemo_relay/__init__.pyipython/nemo_relay/_native.pyipython/nemo_relay/adaptive.pypython/nemo_relay/adaptive.pyipython/nemo_relay/adaptive_topology.pypython/nemo_relay/adaptive_topology.pyipython/tests/test_adaptive.pypython/tests/test_adaptive_config.pypython/tests/test_adaptive_topology.py
| pub fn new(epsilon: f64, stability_window: usize) -> Self { | ||
| Self { | ||
| betti_history: RingBuffer::new(), | ||
| drift_history: RingBuffer::new(), | ||
| error_history: RingBuffer::new(), | ||
| stability_window: stability_window.max(MIN_STABILITY_WINDOW), | ||
| epsilon: sanitize_positive(epsilon, DEFAULT_EPSILON), |
There was a problem hiding this comment.
Clamp stability_window to ring-buffer capacity.
At Line 155, only a minimum clamp is applied. If callers pass a value larger than MAX_HISTORY (32), is_betti_stable and is_drift_decreasing can never satisfy their length checks, so topology-based convergence becomes unreachable.
Proposed fix
- stability_window: stability_window.max(MIN_STABILITY_WINDOW),
+ stability_window: stability_window.clamp(MIN_STABILITY_WINDOW, MAX_HISTORY),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/convergence.rs` around lines 150 - 156, The
stability_window parameter in the new method is only being clamped to a minimum
value using max(MIN_STABILITY_WINDOW), but is not clamped to a maximum value.
Since the RingBuffer instances (betti_history, drift_history, error_history)
have a fixed capacity of MAX_HISTORY (32), if stability_window exceeds this
capacity, methods like is_betti_stable and is_drift_decreasing will never
satisfy their length checks, making topology-based convergence unreachable.
Apply both minimum and maximum clamping to stability_window by using a method
that clamps it between MIN_STABILITY_WINDOW and MAX_HISTORY.
| self.betti_history.push(betti); | ||
| self.drift_history.push(sanitize_non_negative(drift)); | ||
| self.error_history.push(sanitize_non_negative(error)); | ||
| self.epoch += 1; |
There was a problem hiding this comment.
Prevent epoch wraparound in long-running detectors.
Line 166 uses u32 += 1. This can overflow and wrap, which can break downstream epoch-based gating and diagnostics.
Proposed fix
- self.epoch += 1;
+ self.epoch = self.epoch.saturating_add(1);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/convergence.rs` at line 166, The `epoch` field
is typed as `u32` and the increment operation on line 166 (self.epoch += 1) can
cause integer overflow and wraparound in long-running detectors, breaking
epoch-based gating and diagnostics. Change the type of the `epoch` field from
`u32` to `u64` throughout the convergence module to prevent wraparound and
ensure reliable epoch tracking for the lifetime of the detector.
| pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 { | ||
| if self.counts[0] == 0 { | ||
| return 0.0; | ||
| } | ||
| let active = active_mask.iter().filter(|&&x| x).count(); | ||
| 1.0 - (active as f64 / self.counts[0] as f64) |
There was a problem hiding this comment.
Restrict pruning math to populated level-0 blocks.
At Line 256, active is counted across all MAX_BLOCKS. For caller-provided masks, true entries beyond self.counts[0] can produce invalid ratios (including negatives). Count only the tracked prefix.
Proposed fix
- let active = active_mask.iter().filter(|&&x| x).count();
+ let active = active_mask[..self.counts[0]]
+ .iter()
+ .filter(|&&x| x)
+ .count();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 { | |
| if self.counts[0] == 0 { | |
| return 0.0; | |
| } | |
| let active = active_mask.iter().filter(|&&x| x).count(); | |
| 1.0 - (active as f64 / self.counts[0] as f64) | |
| pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 { | |
| if self.counts[0] == 0 { | |
| return 0.0; | |
| } | |
| let active = active_mask[..self.counts[0]] | |
| .iter() | |
| .filter(|&&x| x) | |
| .count(); | |
| 1.0 - (active as f64 / self.counts[0] as f64) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/geometry.rs` around lines 252 - 257, The
pruning_ratio method is counting true values across the entire active_mask
array, but only the first self.counts[0] entries represent actual populated
level-0 blocks. When the mask contains true entries beyond self.counts[0], this
causes the calculation to produce invalid ratios. Restrict the active count
calculation to only the first self.counts[0] elements of active_mask by using
slice notation to iterate over active_mask[..self.counts[0]] instead of the full
active_mask.
| /// Panics if `D * tau` exceeds `EMBED_BUFFER_CAPACITY`. This is a | ||
| /// programming error, not a runtime failure. | ||
| pub fn new(tau: usize) -> Self { | ||
| let tau = if tau == 0 { DEFAULT_TAU } else { tau }; | ||
| assert!( | ||
| D * tau <= EMBED_BUFFER_CAPACITY, | ||
| "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})", | ||
| D * tau, | ||
| EMBED_BUFFER_CAPACITY | ||
| ); |
There was a problem hiding this comment.
Embed readiness threshold is too strict for tau > 1.
Line 121 uses D * tau, but the embedding definition in Line 74 only needs samples through x(t-(D-1)tau), i.e. (D - 1) * tau + 1. This delays valid embeddings and over-restricts constructor validation.
Proposed fix
- assert!(
- D * tau <= EMBED_BUFFER_CAPACITY,
- "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})",
- D * tau,
- EMBED_BUFFER_CAPACITY
- );
+ let required = if D == 0 {
+ 0
+ } else {
+ (D - 1).saturating_mul(tau).saturating_add(1)
+ };
+ assert!(
+ required <= EMBED_BUFFER_CAPACITY,
+ "TimeDelayEmbedder required samples ({}) exceed EMBED_BUFFER_CAPACITY ({})",
+ required,
+ EMBED_BUFFER_CAPACITY
+ );
@@
- let required = D * self.tau;
+ let required = if D == 0 {
+ 0
+ } else {
+ (D - 1).saturating_mul(self.tau).saturating_add(1)
+ };
if self.buffer_len < required {
return None;
}Also applies to: 121-123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/manifold.rs` around lines 91 - 100, The
assertion in the `new` method uses `D * tau` as the validation threshold, but
the actual embedding definition only requires `(D - 1) * tau + 1` samples.
Update the assertion logic to use the correct formula `(D - 1) * tau + 1`
instead of `D * tau`, and also apply the same correction to the embed readiness
threshold check around lines 121-123 to ensure consistency and avoid
unnecessarily delaying valid embeddings.
da2470a to
17488c8
Compare
Signed-off-by: teerth sharma <teerths57@gmail.com>
17488c8 to
24ed6f2
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/adaptive/tests/unit/storage_tests.rs (1)
243-253: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAssert the new
convergedfield in the stability round-trip test.Line 244 loads the full stability record, but the test never checks
converged. Add an explicit assertion so storage regressions on this field are caught.Suggested diff
assert_eq!(loaded_stability.stable_prefix_length, 1); assert_eq!(loaded_stability.total_observations, 3); + assert!(!loaded_stability.converged);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/tests/unit/storage_tests.rs` around lines 243 - 253, The stability round-trip test loads the full stability record into the loaded_stability variable but does not assert the converged field, which could allow storage regressions on this field to go undetected. Add an explicit assertion immediately after the existing stability assertions (after the assertion for loaded_stability.total_observations) to verify the converged field has the expected value, ensuring the field is properly persisted and retrieved during the round-trip.
♻️ Duplicate comments (2)
crates/adaptive-topology/src/manifold.rs (2)
91-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the mathematically correct embedding readiness threshold.
Line 91 and Line 121 use
D * tau, but the embedding in Line 74 ([x(t), x(t-tau), ..., x(t-(D-1)tau)]) only requires(D - 1) * tau + 1samples. Current logic rejects valid embeddings fortau > 1.Proposed fix
pub fn new(tau: usize) -> Self { let tau = if tau == 0 { DEFAULT_TAU } else { tau }; + let required = if D == 0 { + 0 + } else { + (D - 1).saturating_mul(tau).saturating_add(1) + }; assert!( - D * tau <= EMBED_BUFFER_CAPACITY, - "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})", - D * tau, + required <= EMBED_BUFFER_CAPACITY, + "TimeDelayEmbedder required samples ({}) exceed EMBED_BUFFER_CAPACITY ({})", + required, EMBED_BUFFER_CAPACITY ); @@ pub fn embed(&self) -> Option<ManifoldPoint<D>> { - let required = D * self.tau; + let required = if D == 0 { + 0 + } else { + (D - 1).saturating_mul(self.tau).saturating_add(1) + }; if self.buffer_len < required { return None; }Also applies to: 121-123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive-topology/src/manifold.rs` around lines 91 - 100, The embedding readiness threshold is mathematically incorrect. The embedding pattern requires (D - 1) * tau + 1 samples, not D * tau. Update the assertion in the new method (where tau is validated) to use (D - 1) * tau + 1 instead of D * tau. Also apply the same fix to line 121-123 which has the same incorrect threshold check. This correction will allow valid embeddings to proceed instead of rejecting them unnecessarily when tau is greater than 1.
225-246:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMark vertices visited at push-time in DFS to avoid duplicate stack entries.
In Line 239-Line 246, neighbors are marked visited only after pop. That allows duplicate pushes for the same vertex, which can consume the fixed stack and skip reachable vertices, producing incorrect
β₀.Proposed fix
components += 1; let mut stack = [0usize; MAX_GRAPH_POINTS]; let mut stack_top = 1; stack[0] = start; + visited[start] = true; while stack_top > 0 { stack_top -= 1; let current = stack[stack_top]; - if visited[current] { - continue; - } - visited[current] = true; - - for (neighbor, is_visited) in visited.iter().enumerate().take(self.point_count) { - if !is_visited - && self.are_neighbors(current, neighbor) - && stack_top < MAX_GRAPH_POINTS - { + for neighbor in 0..self.point_count { + if !visited[neighbor] && self.are_neighbors(current, neighbor) { + visited[neighbor] = true; stack[stack_top] = neighbor; stack_top += 1; } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive-topology/src/manifold.rs` around lines 225 - 246, The DFS traversal marks vertices as visited only after they are popped from the stack (when visited[current] is set to true), which allows the same vertex to be pushed onto the stack multiple times before it gets processed. This causes duplicate stack entries that can overflow the fixed-size stack and skip reachable vertices, resulting in incorrect component counting. Move the visited marking to happen when pushing neighbors onto the stack rather than when popping them. In the neighbor iteration loop where neighbors are added to the stack (around lines 239-246), set visited[neighbor] = true immediately before pushing the neighbor onto the stack to prevent duplicate pushes of the same vertex.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/adaptive-topology/README.md`:
- Around line 33-35: The code example block in the README.md file starts
immediately after the "## Example" heading without a complete-sentence
introduction. Add a descriptive sentence between the heading and the opening
triple backticks that explains what the code example demonstrates, following the
documentation style guideline that requires every code block to be introduced
with a complete sentence.
In `@crates/adaptive-topology/src/convergence.rs`:
- Around line 237-239: In the betti_score calculation within the
convergence_score method, the variations count is being normalized by
self.stability_window when it should be normalized by the actual number of
transitions. Since variations are counted from window[..count].windows(2), which
produces count - 1 adjacent pairs, change the denominator in the betti_score
assignment from self.stability_window to (count - 1) to correctly normalize the
variation ratio.
In `@crates/adaptive-topology/src/drift.rs`:
- Around line 40-60: Add a validation check at the beginning of the update
method to ensure all values in the centroid array are finite (not NaN or Inf).
If any non-finite value is detected, return early with a drift value of 0.0 to
prevent non-finite values from being written into the internal state fields
(previous, expected, velocity). This guards against NaN values persisting in the
drift and velocity calculations, which would disable meaningful drift checks.
In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs`:
- Around line 361-362: The current assertion only verifies that the agent hints
header is absent from the request, but it does not verify that the agent hints
data is also absent from the request body. To harden the test, add a negative
assertion that checks the request body does not contain the nvext.agent_hints
field after the existing header check on line 361. This ensures that if a
regression accidentally injects nvext.agent_hints in the body while skipping the
header, the test will catch it and fail as expected.
In `@crates/adaptive/tests/unit/runtime_tests.rs`:
- Around line 322-375: The test
validate_config_reports_invalid_topology_numeric_fields should verify
field-level diagnostics for all invalid fields within components, not just
component-level diagnostics. Currently it checks for component paths like
"adaptive_hints.governor" and "acg.convergence", but it should also assert that
diagnostics exist for specific invalid fields within those components. For the
acg.convergence and convergence components which each have multiple invalid
fields (epsilon and stability_window), add additional assertions to the
report.diagnostics iteration to verify that field-level diagnostics are reported
for paths like "acg.convergence.epsilon", "acg.convergence.stability_window",
"convergence.epsilon", and "convergence.stability_window" to ensure all invalid
numeric fields are being validated and prevent silent validator regressions.
In `@docs/adaptive-plugin/about.mdx`:
- Around line 39-40: The documentation uses inconsistent terminology for the
same concept: "topology-aware" appears in one location while "topology-inspired"
appears in another location (in the line mentioning "topology-inspired
signals"). Standardize on a single term throughout the document by identifying
all instances of both "topology-aware" and "topology-inspired" and replacing
them with one consistent term. Ensure the chosen term is applied uniformly
across the entire file to maintain clarity and avoid ambiguity for readers.
In `@docs/adaptive-plugin/configuration.mdx`:
- Around line 221-233: The Rust example for tool_parallelism configuration uses
ToolParallelismComponentConfig::default() implicitly, while the TOML, Python,
and Node examples all explicitly demonstrate the drift configuration. Update the
line setting adaptive.tool_parallelism to explicitly define the drift field
configuration instead of relying on the default() method, ensuring the Rust
example maintains parity with the other language examples in the documentation.
---
Outside diff comments:
In `@crates/adaptive/tests/unit/storage_tests.rs`:
- Around line 243-253: The stability round-trip test loads the full stability
record into the loaded_stability variable but does not assert the converged
field, which could allow storage regressions on this field to go undetected. Add
an explicit assertion immediately after the existing stability assertions (after
the assertion for loaded_stability.total_observations) to verify the converged
field has the expected value, ensuring the field is properly persisted and
retrieved during the round-trip.
---
Duplicate comments:
In `@crates/adaptive-topology/src/manifold.rs`:
- Around line 91-100: The embedding readiness threshold is mathematically
incorrect. The embedding pattern requires (D - 1) * tau + 1 samples, not D *
tau. Update the assertion in the new method (where tau is validated) to use (D -
1) * tau + 1 instead of D * tau. Also apply the same fix to line 121-123 which
has the same incorrect threshold check. This correction will allow valid
embeddings to proceed instead of rejecting them unnecessarily when tau is
greater than 1.
- Around line 225-246: The DFS traversal marks vertices as visited only after
they are popped from the stack (when visited[current] is set to true), which
allows the same vertex to be pushed onto the stack multiple times before it gets
processed. This causes duplicate stack entries that can overflow the fixed-size
stack and skip reachable vertices, resulting in incorrect component counting.
Move the visited marking to happen when pushing neighbors onto the stack rather
than when popping them. In the neighbor iteration loop where neighbors are added
to the stack (around lines 239-246), set visited[neighbor] = true immediately
before pushing the neighbor onto the stack to prevent duplicate pushes of the
same vertex.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: ccb1a467-92d2-498a-ae76-c078c7204100
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (66)
.pre-commit-config.yamlATTRIBUTIONS-Rust.mdCargo.tomlcrates/adaptive-topology/Cargo.tomlcrates/adaptive-topology/README.mdcrates/adaptive-topology/src/convergence.rscrates/adaptive-topology/src/drift.rscrates/adaptive-topology/src/geometry.rscrates/adaptive-topology/src/governor.rscrates/adaptive-topology/src/lib.rscrates/adaptive-topology/src/manifold.rscrates/adaptive-topology/src/topology.rscrates/adaptive/Cargo.tomlcrates/adaptive/README.mdcrates/adaptive/benches/convergence_bench.rscrates/adaptive/src/acg/stability.rscrates/adaptive/src/acg_learner.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/config.rscrates/adaptive/src/lib.rscrates/adaptive/src/plugin_component.rscrates/adaptive/src/runtime/features.rscrates/adaptive/src/runtime/validation.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/tests/integration/runtime_integration_tests.rscrates/adaptive/tests/integration/tool_parallelism_plan_tests.rscrates/adaptive/tests/integration/topology_convergence_tests.rscrates/adaptive/tests/unit/acg/economics_internal_tests.rscrates/adaptive/tests/unit/acg/economics_policy_tests.rscrates/adaptive/tests/unit/acg/multi_breakpoint_tests.rscrates/adaptive/tests/unit/acg_component_tests.rscrates/adaptive/tests/unit/adaptive_hints_intercept_tests.rscrates/adaptive/tests/unit/cache_diagnostics_tests.rscrates/adaptive/tests/unit/config_tests.rscrates/adaptive/tests/unit/intercepts_tests.rscrates/adaptive/tests/unit/plugin_component_tests.rscrates/adaptive/tests/unit/runtime_features_tests.rscrates/adaptive/tests/unit/runtime_tests.rscrates/adaptive/tests/unit/storage_memory_internal_tests.rscrates/adaptive/tests/unit/storage_tests.rscrates/adaptive/tests/unit/tool_parallelism_learner_tests.rscrates/adaptive/tests/unit/types_tests.rscrates/node/adaptive.d.tscrates/node/adaptive.jscrates/node/tests/adaptive_tests.mjscrates/python/Cargo.tomlcrates/python/src/lib.rscrates/python/src/py_adaptive_topology.rscrates/python/tests/coverage/py_storage_coverage_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/adaptive-hints.mdxdocs/adaptive-plugin/configuration.mdxgo/nemo_relay/adaptive.gogo/nemo_relay/adaptive/adaptive.gogo/nemo_relay/adaptive_test.gopython/nemo_relay/__init__.pypython/nemo_relay/__init__.pyipython/nemo_relay/_native.pyipython/nemo_relay/adaptive.pypython/nemo_relay/adaptive.pyipython/nemo_relay/adaptive_topology.pypython/nemo_relay/adaptive_topology.pyipython/tests/test_adaptive.pypython/tests/test_adaptive_config.pypython/tests/test_adaptive_topology.py
| let variations = window[..count].windows(2).filter(|w| w[0] != w[1]).count(); | ||
| let betti_score = 1.0 - (variations as f64 / self.stability_window as f64); | ||
| score += BETTI_STABILITY_WEIGHT * betti_score; |
There was a problem hiding this comment.
Normalize Betti variation by transition count, not window length.
Line [238] divides by self.stability_window, but variations are counted across adjacent pairs (count - 1 transitions). This systematically overstates Betti stability in convergence_score.
Suggested fix
let variations = window[..count].windows(2).filter(|w| w[0] != w[1]).count();
-let betti_score = 1.0 - (variations as f64 / self.stability_window as f64);
+let transitions = count.saturating_sub(1).max(1);
+let betti_score = 1.0 - (variations as f64 / transitions as f64);
score += BETTI_STABILITY_WEIGHT * betti_score;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/convergence.rs` around lines 237 - 239, In the
betti_score calculation within the convergence_score method, the variations
count is being normalized by self.stability_window when it should be normalized
by the actual number of transitions. Since variations are counted from
window[..count].windows(2), which produces count - 1 adjacent pairs, change the
denominator in the betti_score assignment from self.stability_window to (count -
1) to correctly normalize the variation ratio.
| pub fn update(&mut self, centroid: &[f64; D]) -> f64 { | ||
| let drift = if self.has_previous { | ||
| l2_distance(&self.expected, centroid) | ||
| } else { | ||
| 0.0 | ||
| }; | ||
|
|
||
| if self.has_previous { | ||
| for (d, vel) in self.velocity.iter_mut().enumerate().take(D) { | ||
| *vel = centroid[d] - self.previous[d]; | ||
| } | ||
| } | ||
|
|
||
| for (d, exp) in self.expected.iter_mut().enumerate().take(D) { | ||
| *exp = centroid[d] + self.velocity[d]; | ||
| } | ||
|
|
||
| self.previous = *centroid; | ||
| self.has_previous = true; | ||
|
|
||
| drift |
There was a problem hiding this comment.
Guard update against non-finite centroids to prevent persistent NaN state.
At Line [40], non-finite centroid values are accepted and then written into previous/expected/velocity. After that, drift and velocity calculations can stay NaN and effectively disable meaningful drift checks.
Suggested fix
pub fn update(&mut self, centroid: &[f64; D]) -> f64 {
+ if centroid.iter().any(|coord| !coord.is_finite()) {
+ self.reset();
+ return f64::INFINITY;
+ }
+
let drift = if self.has_previous {
l2_distance(&self.expected, centroid)
} else {
0.0
};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/adaptive-topology/src/drift.rs` around lines 40 - 60, Add a validation
check at the beginning of the update method to ensure all values in the centroid
array are finite (not NaN or Inf). If any non-finite value is detected, return
early with a drift value of 0.0 to prevent non-finite values from being written
into the internal state fields (previous, expected, velocity). This guards
against NaN values persisting in the drift and velocity calculations, which
would disable meaningful drift checks.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Teerth Sharma <teerths57@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/adaptive-plugin/about.mdx`:
- Around line 39-40: The file docs/adaptive-plugin/about.mdx is missing the
required SPDX license header at the top. Add the SPDX header using JSX comment
delimiters {/* ... */} at the very beginning of the file before the YAML
frontmatter. The header should include the SPDX-FileCopyrightText line with the
copyright notice and the SPDX-License-Identifier line set to Apache-2.0, wrapped
in the JSX comment syntax as specified in the review comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 289cbccd-452e-4a63-9da2-9d593edfa1a4
📒 Files selected for processing (1)
docs/adaptive-plugin/about.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (12)
{docs/**,README.md,CONTRIBUTING.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Usejust docsfor docs-site builds andjust docs-linkcheckwhen links changed
Run docs site build withjust docs
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Run docs link validation with
just docs-linkcheckwhen links change
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify README and docs entry points still match current package names and paths for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdx
{docs/**,examples/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify examples still run with documented commands for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
Keep package names, repo references, and build commands current
Files:
docs/adaptive-plugin/about.mdx
**/*.mdx
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.
MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)
Files:
docs/adaptive-plugin/about.mdx
**/*.{html,md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Include SPDX license header in HTML and Markdown files using HTML comment syntax
Files:
docs/adaptive-plugin/about.mdx
docs/**/*.{md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed
Files:
docs/adaptive-plugin/about.mdx
docs/**
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Run
just docsor./scripts/build-docs.sh htmlto regenerate ignored Fern API reference pages before validation for documentation site changes
Files:
docs/adaptive-plugin/about.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}
⚙️ CodeRabbit configuration file
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.
Files:
docs/adaptive-plugin/about.mdx
**
⚙️ CodeRabbit configuration file
**:AGENTS.md
This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.
Project Overview
NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.
The shared runtime model is:
- Scope stacks decide where work belongs and which scope-local behavior is visible.
- Middleware registries decide what guardrails and intercepts run around managed calls.
- Plugins install reusable runtime behavior from configuration.
- Events record runtime behavior in ATOF form.
- Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.
Repository Structure
The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.crates/ core/ # Rust core runtime crate, published as nemo-relay adaptive/ # Adaptive runtime primitives and plugin components python/ # PyO3 native extension for the Python package ffi/ # Raw C ABI layer used by downstream bindings such as Go node/ # NAPI Node.js binding and JavaScript/TypeScript entry points wasm/ # wasm-bindgen WebAssembly binding and JS wrappers python/ nemo_relay/ # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers tests/ # Python tests go/ nemo_relay/ # Experimental Go CGo binding and tests fern/ # Fern documentation site scripts/ # Stable wrappers and helper scripts; build/test/docs entry points live in justfile third_party/ # P...
Files:
docs/adaptive-plugin/about.mdx
🔇 Additional comments (2)
docs/adaptive-plugin/about.mdx (2)
39-40: Terminology consistency already addressed.The past review flagged inconsistency between "topology-aware" and "topology-inspired" at line 39. The current code correctly uses "topology-aware signals stabilize," which aligns with terminology used earlier in the document and throughout the adaptive plugin docs. This concern has been resolved.
39-40: New documentation bullets are technically accurate and well-motivated.
- Line 39: "Stop adaptive learning once observed topology-aware signals stabilize" directly reflects convergence detection via Betti-number stability (convergence detector in PR).
- Line 40: "Invalidate stale tool parallelism plans when cohort shape drifts" aligns with drift-based plan invalidation behavior implemented in tool-parallelism learner.
Both bullets are clear, actionable, and grounded in the topology-aware adaptive feature set.
willkill07
left a comment
There was a problem hiding this comment.
Without a design document and clear measurable and observable improvements, I am extremely reluctant to spend the time to review this PR.
Additionally:
- no tests should be embedded in source files
- no new crates should be added if the behavior is internal only (which it sounds like adaptive-topology is
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
|
Addressed the review concerns in
Validation run locally:
|
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/adaptive/src/tool_parallelism_learner.rs (1)
73-75:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDrift invalidation should use an inclusive threshold.
At Line 74, using
>misses invalidation when drift is exactly at threshold.Suggested fix
- Ok(drift > config.threshold) + Ok(drift >= config.threshold)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/tool_parallelism_learner.rs` around lines 73 - 75, In the return statement within the method that updates drift using detector.update(¢roid), change the comparison operator from > to >= when comparing drift against config.threshold. This ensures that invalidation occurs when drift equals the threshold value exactly, not just when it exceeds it. Update the condition Ok(drift > config.threshold) to use the inclusive >= operator instead.
♻️ Duplicate comments (1)
crates/adaptive/src/acg_learner.rs (1)
186-206:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersisting stability before observations can strand profiles in a false converged state.
At Line 227, converged stability is stored before observations (Line 235). If observation storage fails, the next run can hit the converged fast-path at Line 186 and skip repair permanently.
Suggested fix
- if existing_stability + if existing_stability .as_ref() .map(|stability| stability.converged) .unwrap_or(false) + && existing.is_some() { @@ - if self.record_stability_epoch(&profile_key, &stability_result)? { - stability_result.converged = true; - } - - backend - .store_stability(&profile_key, &stability_result) - .await?; + let converged_now = self.record_stability_epoch(&profile_key, &stability_result)?; @@ backend .store_observations(&profile_key, &observations_vec) .await?; + + if converged_now { + stability_result.converged = true; + } + backend + .store_stability(&profile_key, &stability_result) + .await?;Also applies to: 226-236
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/adaptive/src/acg_learner.rs` around lines 186 - 206, The issue is that stability state is being persisted before observations are persisted, creating a race condition where if observation persistence fails, the converged flag in stability remains true, causing the fast-path at the beginning of the function (checking if existing_stability is converged) to incorrectly skip repair on the next run. Fix this by reversing the order of persistence operations so that observations are stored first before stability is persisted, ensuring that if observation storage fails, the converged state remains false and repair will be attempted on the next execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/adaptive/src/topology.rs`:
- Around line 135-147: The convergence check in the is_converged method
currently uses a short-circuit OR operator that can declare convergence based on
insufficient evidence, and the is_error_converged method only checks if the last
single error value is below epsilon rather than verifying stability over a
configured window. To fix this, modify is_error_converged to check that the last
N error samples in error_history are all consistently below epsilon threshold to
guarantee convergence over a window period, rather than checking only the latest
point. Additionally, change the logic operator in is_converged from OR to AND to
ensure that error convergence, betti stability, and drift decrease all must be
satisfied together before declaring the topology converged.
In `@crates/adaptive/tests/unit/topology_tests.rs`:
- Around line 27-33: The test function error_below_epsilon_converges expects
convergence after only a single low-error epoch, which may not align with the
intended window-based stability check. If the ConvergenceDetector requires a
stability window (indicated by the 3 parameter in ConvergenceDetector::new(0.01,
3)), then the test needs to record multiple consecutive epochs with low error
values before asserting is_converged() returns true. Either call record_epoch
multiple times with the same low-error pattern to fill the window, or change the
assertion to expect false convergence after a single epoch to properly test
window-based behavior.
---
Outside diff comments:
In `@crates/adaptive/src/tool_parallelism_learner.rs`:
- Around line 73-75: In the return statement within the method that updates
drift using detector.update(¢roid), change the comparison operator from > to
>= when comparing drift against config.threshold. This ensures that invalidation
occurs when drift equals the threshold value exactly, not just when it exceeds
it. Update the condition Ok(drift > config.threshold) to use the inclusive >=
operator instead.
---
Duplicate comments:
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 186-206: The issue is that stability state is being persisted
before observations are persisted, creating a race condition where if
observation persistence fails, the converged flag in stability remains true,
causing the fast-path at the beginning of the function (checking if
existing_stability is converged) to incorrectly skip repair on the next run. Fix
this by reversing the order of persistence operations so that observations are
stored first before stability is persisted, ensuring that if observation storage
fails, the converged state remains false and repair will be attempted on the
next execution.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 2c6e25e2-9b5b-49f7-8fac-0fcddc8494b7
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
crates/adaptive/Cargo.tomlcrates/adaptive/src/acg_learner.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/lib.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/topology.rscrates/adaptive/tests/unit/topology_tests.rsdocs/adaptive-plugin/about.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdx
💤 Files with no reviewable changes (1)
- crates/adaptive/Cargo.toml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (26)
{docs/**,README.md,CONTRIBUTING.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Usejust docsfor docs-site builds andjust docs-linkcheckwhen links changed
Run docs site build withjust docs
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Run docs link validation with
just docs-linkcheckwhen links change
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify README and docs entry points still match current package names and paths for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,examples/**,README.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Verify examples still run with documented commands for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
Keep package names, repo references, and build commands current
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**/*.mdx
📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)
In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.
MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**/*.{html,md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Include SPDX license header in HTML and Markdown files using HTML comment syntax
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
docs/**/*.{md,mdx}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
docs/**
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Run
just docsor./scripts/build-docs.sh htmlto regenerate ignored Fern API reference pages before validation for documentation site changes
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}
⚙️ CodeRabbit configuration file
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdx
**
⚙️ CodeRabbit configuration file
**:AGENTS.md
This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.
Project Overview
NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.
The shared runtime model is:
- Scope stacks decide where work belongs and which scope-local behavior is visible.
- Middleware registries decide what guardrails and intercepts run around managed calls.
- Plugins install reusable runtime behavior from configuration.
- Events record runtime behavior in ATOF form.
- Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.
Repository Structure
The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.crates/ core/ # Rust core runtime crate, published as nemo-relay adaptive/ # Adaptive runtime primitives and plugin components python/ # PyO3 native extension for the Python package ffi/ # Raw C ABI layer used by downstream bindings such as Go node/ # NAPI Node.js binding and JavaScript/TypeScript entry points wasm/ # wasm-bindgen WebAssembly binding and JS wrappers python/ nemo_relay/ # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers tests/ # Python tests go/ nemo_relay/ # Experimental Go CGo binding and tests fern/ # Fern documentation site scripts/ # Stable wrappers and helper scripts; build/test/docs entry points live in justfile third_party/ # P...
Files:
docs/adaptive-plugin/about.mdxdocs/adaptive-plugin/topology-aware-controls-design.mdxcrates/adaptive/tests/unit/topology_tests.rsdocs/adaptive-plugin/acg.mdxdocs/adaptive-plugin/configuration.mdxcrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.rs
📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)
Use
snake_casenaming convention for Rust identifiers (e.g.,nemo_relay_tool_call)
**/*.rs: Any Rust change must runjust test-rust
Any Rust change must runcargo fmt --all
Any Rust change must runcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allfor all FFI work since it is Rust work
Runjust test-rustto validate FFI changes
Runcargo clippy --workspace --all-targets -- -D warningsto enforce strict linting on FFI workWhen Rust files changed as part of Go work, also run
cargo fmt --all,just test-rust, andcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Runcargo fmt --allwhen Rust files are changed as part of Node work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files are changed as part of Node work
Runjust test-rustwhen Rust files are changed as part of Node work
**/*.rs: Runcargo fmt --allto format all Rust code
Runcargo clippy --workspace --all-targets -- -D warningsto enforce all clippy lints as errors
**/*.rs: Runcargo fmt --allwhen Rust files changed as part of WebAssembly work
Runcargo clippy --workspace --all-targets -- -D warningswhen Rust files changed as part of WebAssembly work
**/*.rs: If any Rust code changed, always runjust test-rust
If any Rust code changed, also runcargo fmt --all
If any Rust code changed, also runcargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting withcargo fmt --all
Run Rust linting withcargo clippy --workspace --all-targets -- -D warnings
**/*.rs: Usecargo fmtfor Rust code formatting
Runcargo clippy -- -D warningsto lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code withuv run pre-commit run --all-filesto enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/adaptive.py,python/nemo_relay/plugin.py,go/nemo_relay/adaptive/**,go/nemo_relay/!(adaptive)/**,**/node/**,**/wasm/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Keep adaptive surface in sync across crates/adaptive, shared plugin behavior in core and bindings, Python adaptive/plugin wrappers in python/nemo_relay/adaptive.py and python/nemo_relay/plugin.py, Go adaptive helpers under go/nemo_relay/adaptive plus shared plugin helpers in go/nemo_relay, and Node/WebAssembly adaptive helpers and plugin wrappers
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}: Maintain consistent plugin lifecycle across all language bindings (Python, Go, Node/WebAssembly, and Rust)
Keep plugin context surfaces aligned across all language implementations
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}
📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)
Maintain documented and tested validation and report behavior for adaptive surfaces
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/{Cargo.toml,**/*.rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Maintain consistency between Rust package names in
Cargo.tomland their actual usage across the codebase
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{h,hpp,c,cpp,rs}
📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)
Ensure FFI header and library naming follows consistent conventions across platform-specific builds
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/core,crates/adaptive}/**/*
📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)
Changes to
crates/coreorcrates/adaptivemust run the full language matrix
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,toml}
📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)
Update Rust crate names and module prefixes during coordinated rename operations
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**
📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)
If
crates/coreorcrates/adaptivechanged, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}
📄 CodeRabbit inference engine (AGENTS.md)
Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
**/*.{rs,py,go,js,ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Follow binding naming conventions: Rust and Python use
snake_case, C FFI exports prefixednemo_relay_, Go usesPascalCasefor public APIs, Node.js usescamelCase.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
crates/**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
UseJson = serde_json::Valuein Rust-facing runtime APIs for JSON payload handling.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**/*.rs
⚙️ CodeRabbit configuration file
crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.
Files:
crates/adaptive/tests/unit/topology_tests.rscrates/adaptive/src/topology.rscrates/adaptive/src/adaptive_hints_intercept.rscrates/adaptive/src/tool_parallelism_learner.rscrates/adaptive/src/lib.rscrates/adaptive/src/acg_learner.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}
⚙️ CodeRabbit configuration file
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.
Files:
crates/adaptive/tests/unit/topology_tests.rs
🔇 Additional comments (7)
docs/adaptive-plugin/topology-aware-controls-design.mdx (1)
1-83: LGTM!docs/adaptive-plugin/configuration.mdx (2)
202-235: Rust example now properly demonstrates topology-aware configuration.The Rust example imports are extended with
ConvergenceConfigandDriftConfig(line 205), and the initialization now explicitly setstool_parallelism.drift,adaptive_hints.governor,acg.convergence, and top-levelconvergencefields (lines 221-235). This matches the Python and Node.js examples' completeness and addresses the prior guidance about explicit cross-language parity.
249-268: TOML example already includes all topology-aware control blocks with correct defaults and cross-language parity.The TOML configuration snippet (lines 74-110) contains all three topology-aware control sections documented in the field table:
[components.config.tool_parallelism.drift],[components.config.adaptive_hints.governor], and[components.config.acg.convergence]. Field names and default values align with the Python, Node.js, and Rust examples. No changes needed.docs/adaptive-plugin/acg.mdx (1)
224-238: No changes needed; convergence validation rules are correctly documented in the "Common Validation Failures" section.docs/adaptive-plugin/about.mdx (1)
1-64: SPDX header is correctly formatted.The MDX file includes the required SPDX header with proper JSX comment delimiters
{/* ... */}after the YAML frontmatter. No action needed.crates/adaptive/src/adaptive_hints_intercept.rs (1)
137-150: LGTM!Also applies to: 183-196, 223-227, 248-258
crates/adaptive/src/lib.rs (1)
34-34: Please confirm the required Rust validation commands were run for this change set.I only see explicit test counts in the PR context; please confirm
just test-rust,cargo fmt --all, andcargo clippy --workspace --all-targets -- -D warningswere executed successfully.As per coding guidelines, "If any Rust code changed, always run
just test-rust; also runcargo fmt --all; also runcargo clippy --workspace --all-targets -- -D warnings."Source: Coding guidelines
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
|
@willkill07 @lvojtku ready for re-review when you have bandwidth. The requested changes have been addressed on the current head
Validation for the latest docs move: |
willkill07
left a comment
There was a problem hiding this comment.
Additionally, the design doc isn't something that should be published as documentation.
I'm still trying to understand the benefit besides "just because we can"
There was a problem hiding this comment.
This is not a design document showing benefits. You state measurable Outcomes but do not provide any samples to demonstrate usefulness.
There was a problem hiding this comment.
Addressed in 012a96643d566113e7d53c00505638c611ea07e6 by adding a Sample Evidence section to this design doc.
It now includes concrete samples for each control:
- ACG convergence: benchmark fixture with
observations-to-decision: without=50, with=3 - tool drift: sample cohort centroid movement from
[1, 2, 0.0, 2]to[2, 3, 0.4, 3], showing stale plan invalidation - hint governor: sample learned hint shed at
latency_sensitivity = 2.0underepsilon = 10.0, while manual override at11.0still emits hints
I also ran cd fern && npx fern check --warnings; it passes with only the expected unauthenticated redirects warning.
There was a problem hiding this comment.
Addressed in 131ffbc60948a0bcc2d8b8d813c8283f7eeb48d6.
Changes made:
- moved the design note out of the published Fern docs tree to
.github/designs/topology-aware-adaptive-controls.md - removed the public docs link/page from
docs/adaptive-plugin/about.mdxanddocs/adaptive-plugin/configuration.mdx - rewrote the design note around benefit gates rather than capability: each control now has a concrete gate for when it should stay disabled versus when it is useful
- kept the sample evidence in the internal note, including the benchmark sample
observations-to-decision: without=50, with=3
Docs validation still passes with cd fern && npx fern check --warnings.
There was a problem hiding this comment.
On it will give measure samples as many i can on my local device
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
Signed-off-by: teerth sharma <teerths57@gmail.com>
|
@teerthsharma @KarmicDaoist Let's try this again:
The branch name used for the PR includes the name |
Signed-off-by: teerth sharma <teerths57@gmail.com>
|
Thanks for the direct feedback. I understand the concern. I will use only this GitHub account ( I also want to be clear about GenAI usage: I used AI assistance for drafting, test planning, and review passes, but I personally own the PR, the validation, and the decision to submit it. I am not asking you to merge unvetted generated code. The core architecture idea (topology) behind this PR comes from my open research work, arXiv:2604.19792. The Relay implementation here is not presented as research ornamentation; it is a guarded runtime hygiene layer:
The branch name containing Since your comment, I added one final hardening commit ( Validation I ran locally:
I do not intend to add more support comments or follow-up bugfix commits before review. This is my final review request for the current PR state. |
|
@teerthsharma the ArXiV publication has exactly one in-passing mention of using betti numbers. The main contribution of your work is not in line with this open PR. I need more substantial evidence to merge in ~5k lines of code that maintainers of Relay would have to up-keep and maintain. Again, the benchmarks show it does fundamentally work, but I still do not see the measured cost benefit. Until that is shown where we can also justify the maintenance cost, I'm highly reluctant to merge this functionality. |
|
To clarify my previous note, when I mentioned being "inspired" by the ArXiV publication, I meant it in the broader sense of leveraging topological functions for this architecture, rather than a direct 1:1 implementation of that arrangement as every repo and use cases are different rest my github have a lot of such examples which as you noticed I am applying everywhere |
I'm really looking for an example that outperforms the current online adaptive config in some benchmark (PinchBench, SWEBench, etc). Measurement of success should be rooted in cost savings or measured e2e execution improvements. I understand that the latter may be harder to prove unless you have a dynamo-accelerated endpoint. and the former is only currently usable for anthropic cache control. But we need to see some numbers with clear benefit. Synthetic benchmarks alone do not suffice. |
Overview
This pull request adds optional topology-aware controls to the existing
nemo-relay-adaptiveruntime. The controls are internal adaptive behavior, not a new public topology package: the separatenemo-relay-adaptive-topologycrate has been removed, and the proposed Pythonnemo_relay.adaptive_topologyprimitive surface has been removed.Reviewer-facing design note:
.github/designs/topology-aware-adaptive-controls.md.This design note is intentionally not published in Fern documentation. It records the goals, non-goals, benefit gates, observable signals, sample evidence, and rollout expectations for the internal controls.
Details
This change keeps topology-aware behavior inside the adaptive plugin contract:
crates/adaptive/src/topology.rsas an internal module with the bounded primitives needed by the adaptive learners:ConvergenceDetector,DriftDetector,GeometricGovernor, andBettiNumbers.crates/adaptive-topologyworkspace crate and its Cargo workspace dependency.crates/adaptive/tests/unit/topology_tests.rs..github/designs/.The three measurable controls are:
Sample evidence captured in the design note:
Local validation:
Validation notes:
could not start locally because
uvfailed while removing.venv\lib64with WindowsAccess is denied (os error 5)before test collection.could not start in this Windows shell because
justcould not findcygpathto translate the recipe shebang path. Running Fern directly passedfern check --warnings; direct broken-link validation still reports missing ignored generated API reference pages because the docs API-reference generation step could not be run throughjust docsin this shell.Where should the reviewer start?
Start with the design and scope reduction:
Then review the runtime integration points:
Then check binding and docs parity for configuration-only public surfaces:
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
.github/designs/and is not published in Fern docs.