Skip to content

feat: add topology-aware adaptive controls#282

Open
teerthsharma wants to merge 11 commits into
NVIDIA:mainfrom
teerthsharma:POC/aether-topological-adaptive-runtime
Open

feat: add topology-aware adaptive controls#282
teerthsharma wants to merge 11 commits into
NVIDIA:mainfrom
teerthsharma:POC/aether-topological-adaptive-runtime

Conversation

@teerthsharma

@teerthsharma teerthsharma commented Jun 18, 2026

Copy link
Copy Markdown

Overview

This pull request adds optional topology-aware controls to the existing nemo-relay-adaptive runtime. The controls are internal adaptive behavior, not a new public topology package: the separate nemo-relay-adaptive-topology crate has been removed, and the proposed Python nemo_relay.adaptive_topology primitive surface has been removed.

Reviewer-facing design note: .github/designs/topology-aware-adaptive-controls.md.

This design note is intentionally not published in Fern documentation. It records the goals, non-goals, benefit gates, observable signals, sample evidence, and rollout expectations for the internal controls.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

This change keeps topology-aware behavior inside the adaptive plugin contract:

  • Adds crates/adaptive/src/topology.rs as an internal module with the bounded primitives needed by the adaptive learners: ConvergenceDetector, DriftDetector, GeometricGovernor, and BettiNumbers.
  • Removes the separate crates/adaptive-topology workspace crate and its Cargo workspace dependency.
  • Removes the proposed public Python topology primitive bindings and their tests. Public bindings now expose only the adaptive configuration knobs already needed for plugin activation.
  • Keeps test bodies outside source files by placing the new internal primitive tests in crates/adaptive/tests/unit/topology_tests.rs.
  • Keeps the reviewer-facing design note outside published docs under .github/designs/.
  • Updates Adaptive docs to avoid linking an internal design document while still documenting when users should enable the controls.

The three measurable controls are:

ACG convergence: stable profiles set StabilityAnalysisResult.converged=true and stop adding observations after at least stability_window epochs.
Tool-plan drift: stored fan-out plans are rebuilt when cohort drift exceeds tool_parallelism.drift.threshold.
Hint governor: learned hints below adaptive_hints.governor.epsilon are omitted, while manual latency-sensitivity overrides still inject hints.

Sample evidence captured in the design note:

ACG observations-to-decision: without=50, with=3
Tool-plan drift: stale fan-out plan invalidates after cohort centroid moves beyond threshold
Hint governor: low-value learned hints are shed while manual overrides are preserved

Local validation:

cargo fmt --all
cargo test -p nemo-relay-adaptive topology -- --nocapture
cargo test -p nemo-relay-adaptive --test topology_convergence -- --nocapture
cargo test -p nemo-relay-adaptive --test tool_parallelism_plan -- --nocapture
cargo test -p nemo-relay-adaptive adaptive_hints_governor -- --nocapture
cargo test -p nemo-relay-adaptive --lib
cargo check -p nemo-relay-python
cargo clippy -p nemo-relay-adaptive --all-targets -- -D warnings
cargo bench -p nemo-relay-adaptive --bench convergence_bench -- --sample-size 10
cd fern && npx fern check --warnings

Validation notes:

uv run pytest python/tests/test_adaptive.py python/tests/test_adaptive_config.py -q

could not start locally because uv failed while removing .venv\lib64 with Windows Access is denied (os error 5) before test collection.

just docs

could not start in this Windows shell because just could not find cygpath to translate the recipe shebang path. Running Fern directly passed fern check --warnings; direct broken-link validation still reports missing ignored generated API reference pages because the docs API-reference generation step could not be run through just docs in this shell.

Where should the reviewer start?

Start with the design and scope reduction:

.github/designs/topology-aware-adaptive-controls.md
Cargo.toml
Cargo.lock
crates/adaptive/Cargo.toml
crates/adaptive/src/topology.rs
crates/adaptive/tests/unit/topology_tests.rs

Then review the runtime integration points:

crates/adaptive/src/acg_learner.rs
crates/adaptive/src/tool_parallelism_learner.rs
crates/adaptive/src/adaptive_hints_intercept.rs
crates/adaptive/src/runtime/validation.rs
crates/adaptive/src/plugin_component.rs
crates/adaptive/tests/integration/topology_convergence_tests.rs
crates/adaptive/tests/unit/tool_parallelism_learner_tests.rs
crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs

Then check binding and docs parity for configuration-only public surfaces:

python/nemo_relay/adaptive.py
python/nemo_relay/adaptive.pyi
crates/node/adaptive.js
crates/node/adaptive.d.ts
go/nemo_relay/adaptive.go
go/nemo_relay/adaptive/adaptive.go
docs/adaptive-plugin/configuration.mdx

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Relates to: none

Summary by CodeRabbit

  • New Features
    • Added topology-aware early stopping for Adaptive Cache Governor (ACG) with optional convergence controls.
    • Added topology-aware drift detection for tool parallelism planning to invalidate stale execution plans.
    • Added a hint governor to load-shed low-sensitivity learned hints under load.
  • Improvements
    • Reused cached convergence results to avoid redundant stability analysis once converged.
    • Extended configuration + validation surfaces across Rust, Node, and Python (including new helper constructors/builders).
  • Documentation
    • Updated Adaptive plugin docs with convergence/drift/governor configuration guidance. The reviewer-facing design note is internal under .github/designs/ and is not published in Fern docs.
  • Tests
    • Added unit and integration coverage for convergence timing, drift invalidation, and governor behavior.

@teerthsharma teerthsharma requested review from a team and lvojtku as code owners June 18, 2026 05:55
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added size:XXL PR is very large Feature a new feature lang:go PR changes/introduces Go code lang:js PR changes/introduces Javascript/Typescript code lang:python PR changes/introduces Python code lang:rust PR changes/introduces Rust code labels Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds three internal topology-aware control primitives (ConvergenceDetector, DriftDetector, GeometricGovernor) to crates/adaptive/src/topology.rs, then integrates them into AcgLearner (convergence-based early stopping with per-profile detector registry), ToolParallelismLearner (drift-triggered plan invalidation via centroid feature vector), and AdaptiveHintsIntercept (load-shedding governor gating learned hint injection). Extends typed configuration schemas across Rust, Node.js, Go, and Python SDKs with GovernorConfig, DriftConfig, and ConvergenceConfig, each defaulting to disabled. Updates all existing adaptive plugin documentation with capability descriptions, field tables, validation rules, and per-language configuration examples.

Changes

Topology-Aware Adaptive Controls

Layer / File(s) Summary
topology.rs primitives and unit tests
crates/adaptive/src/topology.rs, crates/adaptive/tests/unit/topology_tests.rs
Adds ring-buffer history, BettiNumbers, ConvergenceDetector (Betti stability + monotonic drift checking), DriftDetector<D> (centroid L2 velocity estimation), GeometricGovernor (PD epsilon adaptation), numeric sanitization helpers, and comprehensive unit tests covering non-finite input handling and clamping.
StabilityAnalysisResult.converged field
crates/adaptive/src/acg/stability.rs, crates/adaptive/tests/unit/acg/*, crates/adaptive/tests/unit/acg_component_tests.rs, crates/adaptive/tests/unit/storage_tests.rs, crates/adaptive/tests/unit/storage_memory_internal_tests.rs, crates/adaptive/tests/unit/types_tests.rs, crates/adaptive/tests/unit/cache_diagnostics_tests.rs, crates/adaptive/tests/unit/intercepts_tests.rs, crates/python/tests/coverage/py_storage_coverage_tests.rs
Adds pub converged: bool with #[serde(default)] to StabilityAnalysisResult; propagates converged: false to all test fixtures across Rust and Python storage tests.
Typed config schema across all SDKs
crates/adaptive/src/config.rs, crates/node/adaptive.d.ts, crates/node/adaptive.js, go/nemo_relay/adaptive.go, go/nemo_relay/adaptive/adaptive.go, python/nemo_relay/adaptive.py, python/nemo_relay/adaptive.pyi
Introduces GovernorConfig, DriftConfig, ConvergenceConfig in Rust (with editor schema registration), Node.js (TypeScript interfaces and factory functions), Go (base types and convenience aliases), and Python (dataclasses and type stubs). Wires optional fields into AdaptiveConfig, AdaptiveHintsComponentConfig, ToolParallelismComponentConfig, and AcgComponentConfig with factory/builder functions across all SDKs; all default to disabled.
AcgLearner convergence integration
crates/adaptive/src/acg_learner.rs
Adds new_with_convergence constructor, per-profile ConvergenceDetector registry (RwLock-protected HashMap), and record_stability_epoch (maps stability analysis to BettiNumbers, enforces minimum epoch count derived from stability_window). process_run gains a fast-path that skips observation window integration and reanalysis when cached stability is already converged: true, and marks convergence true when detector confirms with sufficient epochs.
Hint injection governor and tool drift invalidation
crates/adaptive/src/adaptive_hints_intercept.rs, crates/adaptive/src/tool_parallelism_learner.rs
AdaptiveHintsIntercept::with_governor(...) adds mutex-protected HintGovernor wrapping GeometricGovernor; injection becomes conditional via should_inject_hints (forced on manual override, otherwise governed). ToolParallelismLearner::new_with_drift adds optional drift config and lock-protected DriftDetector<4>; process_run computes 4-element cohort centroid feature vector and invalidates stored plan when L2 drift exceeds threshold.
Runtime wiring, validation, and plugin config
crates/adaptive/src/lib.rs, crates/adaptive/src/runtime/features.rs, crates/adaptive/src/runtime/validation.rs, crates/adaptive/src/plugin_component.rs, crates/adaptive/Cargo.toml
Threads topology config through pending_featuresTelemetryFeature::newbuild_learners, switching to new_with_drift/new_with_convergence. Adds validate_convergence and validate_positive_finite for numeric bounds checking. Expands plugin allowlists for governor/drift/convergence nested fields with keyset validation. Declares topology module and GLOBAL_RUNTIME_TEST_MUTEX test support; registers convergence_bench and integration test targets.
Tests, benchmarks, and test synchronization
crates/adaptive/benches/convergence_bench.rs, crates/adaptive/tests/integration/topology_convergence_tests.rs, crates/adaptive/tests/unit/tool_parallelism_learner_tests.rs, crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs, crates/adaptive/tests/unit/runtime_features_tests.rs, crates/adaptive/tests/unit/plugin_component_tests.rs, crates/adaptive/tests/unit/runtime_tests.rs, crates/adaptive/tests/integration/tool_parallelism_plan_tests.rs, crates/node/tests/adaptive_tests.mjs, python/tests/test_adaptive.py, python/tests/test_adaptive_config.py, go/nemo_relay/adaptive_test.go
Adds Criterion convergence benchmark (with/without convergence, asserts ≤ observation delta), topology_convergence integration test (timing constraints and cached-reuse validation), drift invalidation unit test, and hints governor unit test. Migrates runtime/plugin tests to GLOBAL_RUNTIME_TEST_MUTEX for coordinated global-state access. Adds SDK config tests for new topology fields. Fixes duplicate struct field literals in tool_parallelism_plan_tests.
Documentation: design, capabilities, and config reference
docs/adaptive-plugin/topology-aware-controls-design.mdx, docs/adaptive-plugin/about.mdx, docs/adaptive-plugin/acg.mdx, docs/adaptive-plugin/adaptive-hints.mdx, docs/adaptive-plugin/configuration.mdx, crates/adaptive/README.md
Adds new design page covering internal primitives, signal transformations (stability → Betti, cohorts → centroid), measurable outcomes by control, and rollout guidance (disabled by default, sequential enablement). Updates capability bullets, field tables with defaults/validation, "When You Get/Use" guidance, and per-language examples (TOML, Python, Node.js, Rust) throughout.

Pre-commit ty Hook Fix

Layer / File(s) Summary
ty hook --exclude pattern update
.pre-commit-config.yaml
Changes --exclude patterns from path/** recursive glob to path/ trailing-slash directory-only form; adds --force-exclude.

Sequence Diagram(s)

sequenceDiagram
  participant Runtime as AdaptiveRuntime
  participant Feature as TelemetryFeature
  participant ACG as AcgLearner
  participant Detector as ConvergenceDetector
  participant Backend as AdaptiveBackend

  Runtime->>Feature: new(tool_parallelism, acg, convergence)
  Feature->>ACG: new_with_convergence(agent_id, window, convergence_cfg)
  loop Each process_run
    ACG->>Backend: load_stability(profile_key)
    alt cached converged=true
      ACG->>ACG: reuse cached stability counts
    else converged=false
      ACG->>ACG: build rolling observation window
      ACG->>ACG: analyze_stability()
      ACG->>Detector: record_epoch(betti, drift, error)
      Detector-->>ACG: is_converged() && epoch >= stability_window
      alt converged confirmed
        ACG->>Backend: store_stability(converged=true)
      else not yet converged
        ACG->>Backend: store_stability(converged=false)
      end
    end
  end
Loading
sequenceDiagram
  participant Intercept as AdaptiveHintsIntercept
  participant Governor as HintGovernor
  participant GGov as GeometricGovernor
  participant Request as LLM Request

  Intercept->>Intercept: build hints from HotCache
  Intercept->>Intercept: should_inject_hints(hints, manual_ls)
  alt manual latency override set
    Intercept->>Request: inject AgentHints (forced)
  else no manual override
    alt HintGovernor configured
      Intercept->>Governor: allow(latency_sensitivity)
      Governor->>GGov: should_trigger(deviation)
      GGov-->>Governor: bool
      Governor->>GGov: adapt(observed_rate, dt)
      alt allowed
        Intercept->>Request: inject AgentHints
      else shed
        Intercept->>Request: skip injection
      end
    else no governor
      Intercept->>Request: inject AgentHints
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits format with lowercase 'feat' type and concise imperative summary under 72 characters.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description is well-structured and comprehensive, covering overview, architectural decisions, implementation details, test strategy, and reviewer guidance.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/adaptive/tests/unit/storage_tests.rs (1)

82-94: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Assert converged in stability round-trip tests.

The fixture now includes converged, but the round-trip test only checks stable_prefix_length and total_observations. Add an assertion for loaded_stability.converged to lock this storage contract, since runtime behavior reads this field.

Proposed test assertion
     assert_eq!(loaded_stability.stable_prefix_length, 1);
     assert_eq!(loaded_stability.total_observations, 3);
+    assert!(!loaded_stability.converged);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive/tests/unit/storage_tests.rs` around lines 82 - 94, In the
round-trip test that uses the sample_stability fixture, add an assertion to
verify that loaded_stability.converged equals the value set in the
sample_stability function (which is false). This assertion should be added
alongside the existing assertions for stable_prefix_length and
total_observations to ensure the converged field is properly persisted and
loaded during the serialization round-trip.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/adaptive-topology/src/convergence.rs`:
- Around line 150-156: The stability_window parameter in the new method is only
being clamped to a minimum value using max(MIN_STABILITY_WINDOW), but is not
clamped to a maximum value. Since the RingBuffer instances (betti_history,
drift_history, error_history) have a fixed capacity of MAX_HISTORY (32), if
stability_window exceeds this capacity, methods like is_betti_stable and
is_drift_decreasing will never satisfy their length checks, making
topology-based convergence unreachable. Apply both minimum and maximum clamping
to stability_window by using a method that clamps it between
MIN_STABILITY_WINDOW and MAX_HISTORY.
- Line 166: The `epoch` field is typed as `u32` and the increment operation on
line 166 (self.epoch += 1) can cause integer overflow and wraparound in
long-running detectors, breaking epoch-based gating and diagnostics. Change the
type of the `epoch` field from `u32` to `u64` throughout the convergence module
to prevent wraparound and ensure reliable epoch tracking for the lifetime of the
detector.

In `@crates/adaptive-topology/src/geometry.rs`:
- Around line 252-257: The pruning_ratio method is counting true values across
the entire active_mask array, but only the first self.counts[0] entries
represent actual populated level-0 blocks. When the mask contains true entries
beyond self.counts[0], this causes the calculation to produce invalid ratios.
Restrict the active count calculation to only the first self.counts[0] elements
of active_mask by using slice notation to iterate over
active_mask[..self.counts[0]] instead of the full active_mask.

In `@crates/adaptive-topology/src/manifold.rs`:
- Around line 91-100: The assertion in the `new` method uses `D * tau` as the
validation threshold, but the actual embedding definition only requires `(D - 1)
* tau + 1` samples. Update the assertion logic to use the correct formula `(D -
1) * tau + 1` instead of `D * tau`, and also apply the same correction to the
embed readiness threshold check around lines 121-123 to ensure consistency and
avoid unnecessarily delaying valid embeddings.
- Around line 239-246: The issue in the compute_betti_0 function is that nodes
are not marked as visited when pushed onto the stack, only when popped, causing
the same neighbor to be pushed multiple times. Under dense connectivity, this
wastes the fixed-size stack space and can cause the stack to overflow, skipping
reachable nodes and producing incorrect Betti number calculations. Fix this by
marking a neighbor as visited immediately when it is pushed onto the stack in
the section where stack[stack_top] = neighbor is executed, rather than deferring
the visited marking until pop-time, to prevent duplicate pushes of the same
neighbor.

In `@crates/adaptive/src/acg_learner.rs`:
- Around line 185-208: The issue is that store_stability is being called before
store_observations, which means if observation storage fails after stability is
marked as converged, the next run will skip observation repair permanently. To
fix this, locate all places where store_stability and store_observations are
called together (including the instances at lines 223-229 and 235-237 mentioned
in the comment), and reorder these calls so that store_observations is always
called before store_stability. This ensures that if observation storage fails,
the profile won't yet be marked as converged and can be retried on the next run.

In `@python/nemo_relay/_native.pyi`:
- Around line 1235-1236: The DriftDetector.update method stub currently accepts
Sequence[float] for the centroid parameter, but the native binding requires a
fixed 3-element array. Change the centroid parameter type from Sequence[float]
to a fixed-size type representation (such as a tuple of exactly three floats) to
match the native binding's expectations. Additionally, update the docstring to
explicitly clarify that the centroid must be a 3-dimensional coordinate with
exactly three float values.

---

Outside diff comments:
In `@crates/adaptive/tests/unit/storage_tests.rs`:
- Around line 82-94: In the round-trip test that uses the sample_stability
fixture, add an assertion to verify that loaded_stability.converged equals the
value set in the sample_stability function (which is false). This assertion
should be added alongside the existing assertions for stable_prefix_length and
total_observations to ensure the converged field is properly persisted and
loaded during the serialization round-trip.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8b2fd9ec-2173-410e-a7fa-73fc991c831b

📥 Commits

Reviewing files that changed from the base of the PR and between d5c2407 and da2470a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (66)
  • .pre-commit-config.yaml
  • ATTRIBUTIONS-Rust.md
  • Cargo.toml
  • crates/adaptive-topology/Cargo.toml
  • crates/adaptive-topology/README.md
  • crates/adaptive-topology/src/convergence.rs
  • crates/adaptive-topology/src/drift.rs
  • crates/adaptive-topology/src/geometry.rs
  • crates/adaptive-topology/src/governor.rs
  • crates/adaptive-topology/src/lib.rs
  • crates/adaptive-topology/src/manifold.rs
  • crates/adaptive-topology/src/topology.rs
  • crates/adaptive/Cargo.toml
  • crates/adaptive/README.md
  • crates/adaptive/benches/convergence_bench.rs
  • crates/adaptive/src/acg/stability.rs
  • crates/adaptive/src/acg_learner.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/config.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/plugin_component.rs
  • crates/adaptive/src/runtime/features.rs
  • crates/adaptive/src/runtime/validation.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/tests/integration/runtime_integration_tests.rs
  • crates/adaptive/tests/integration/tool_parallelism_plan_tests.rs
  • crates/adaptive/tests/integration/topology_convergence_tests.rs
  • crates/adaptive/tests/unit/acg/economics_internal_tests.rs
  • crates/adaptive/tests/unit/acg/economics_policy_tests.rs
  • crates/adaptive/tests/unit/acg/multi_breakpoint_tests.rs
  • crates/adaptive/tests/unit/acg_component_tests.rs
  • crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs
  • crates/adaptive/tests/unit/cache_diagnostics_tests.rs
  • crates/adaptive/tests/unit/config_tests.rs
  • crates/adaptive/tests/unit/intercepts_tests.rs
  • crates/adaptive/tests/unit/plugin_component_tests.rs
  • crates/adaptive/tests/unit/runtime_features_tests.rs
  • crates/adaptive/tests/unit/runtime_tests.rs
  • crates/adaptive/tests/unit/storage_memory_internal_tests.rs
  • crates/adaptive/tests/unit/storage_tests.rs
  • crates/adaptive/tests/unit/tool_parallelism_learner_tests.rs
  • crates/adaptive/tests/unit/types_tests.rs
  • crates/node/adaptive.d.ts
  • crates/node/adaptive.js
  • crates/node/tests/adaptive_tests.mjs
  • crates/python/Cargo.toml
  • crates/python/src/lib.rs
  • crates/python/src/py_adaptive_topology.rs
  • crates/python/tests/coverage/py_storage_coverage_tests.rs
  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/adaptive-hints.mdx
  • docs/adaptive-plugin/configuration.mdx
  • go/nemo_relay/adaptive.go
  • go/nemo_relay/adaptive/adaptive.go
  • go/nemo_relay/adaptive_test.go
  • python/nemo_relay/__init__.py
  • python/nemo_relay/__init__.pyi
  • python/nemo_relay/_native.pyi
  • python/nemo_relay/adaptive.py
  • python/nemo_relay/adaptive.pyi
  • python/nemo_relay/adaptive_topology.py
  • python/nemo_relay/adaptive_topology.pyi
  • python/tests/test_adaptive.py
  • python/tests/test_adaptive_config.py
  • python/tests/test_adaptive_topology.py

Comment on lines +150 to +156
pub fn new(epsilon: f64, stability_window: usize) -> Self {
Self {
betti_history: RingBuffer::new(),
drift_history: RingBuffer::new(),
error_history: RingBuffer::new(),
stability_window: stability_window.max(MIN_STABILITY_WINDOW),
epsilon: sanitize_positive(epsilon, DEFAULT_EPSILON),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clamp stability_window to ring-buffer capacity.

At Line 155, only a minimum clamp is applied. If callers pass a value larger than MAX_HISTORY (32), is_betti_stable and is_drift_decreasing can never satisfy their length checks, so topology-based convergence becomes unreachable.

Proposed fix
-            stability_window: stability_window.max(MIN_STABILITY_WINDOW),
+            stability_window: stability_window.clamp(MIN_STABILITY_WINDOW, MAX_HISTORY),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/convergence.rs` around lines 150 - 156, The
stability_window parameter in the new method is only being clamped to a minimum
value using max(MIN_STABILITY_WINDOW), but is not clamped to a maximum value.
Since the RingBuffer instances (betti_history, drift_history, error_history)
have a fixed capacity of MAX_HISTORY (32), if stability_window exceeds this
capacity, methods like is_betti_stable and is_drift_decreasing will never
satisfy their length checks, making topology-based convergence unreachable.
Apply both minimum and maximum clamping to stability_window by using a method
that clamps it between MIN_STABILITY_WINDOW and MAX_HISTORY.

self.betti_history.push(betti);
self.drift_history.push(sanitize_non_negative(drift));
self.error_history.push(sanitize_non_negative(error));
self.epoch += 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent epoch wraparound in long-running detectors.

Line 166 uses u32 += 1. This can overflow and wrap, which can break downstream epoch-based gating and diagnostics.

Proposed fix
-        self.epoch += 1;
+        self.epoch = self.epoch.saturating_add(1);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/convergence.rs` at line 166, The `epoch` field
is typed as `u32` and the increment operation on line 166 (self.epoch += 1) can
cause integer overflow and wraparound in long-running detectors, breaking
epoch-based gating and diagnostics. Change the type of the `epoch` field from
`u32` to `u64` throughout the convergence module to prevent wraparound and
ensure reliable epoch tracking for the lifetime of the detector.

Comment on lines +252 to +257
pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 {
if self.counts[0] == 0 {
return 0.0;
}
let active = active_mask.iter().filter(|&&x| x).count();
1.0 - (active as f64 / self.counts[0] as f64)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restrict pruning math to populated level-0 blocks.

At Line 256, active is counted across all MAX_BLOCKS. For caller-provided masks, true entries beyond self.counts[0] can produce invalid ratios (including negatives). Count only the tracked prefix.

Proposed fix
-        let active = active_mask.iter().filter(|&&x| x).count();
+        let active = active_mask[..self.counts[0]]
+            .iter()
+            .filter(|&&x| x)
+            .count();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 {
if self.counts[0] == 0 {
return 0.0;
}
let active = active_mask.iter().filter(|&&x| x).count();
1.0 - (active as f64 / self.counts[0] as f64)
pub fn pruning_ratio(&self, active_mask: &[bool; MAX_BLOCKS]) -> f64 {
if self.counts[0] == 0 {
return 0.0;
}
let active = active_mask[..self.counts[0]]
.iter()
.filter(|&&x| x)
.count();
1.0 - (active as f64 / self.counts[0] as f64)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/geometry.rs` around lines 252 - 257, The
pruning_ratio method is counting true values across the entire active_mask
array, but only the first self.counts[0] entries represent actual populated
level-0 blocks. When the mask contains true entries beyond self.counts[0], this
causes the calculation to produce invalid ratios. Restrict the active count
calculation to only the first self.counts[0] elements of active_mask by using
slice notation to iterate over active_mask[..self.counts[0]] instead of the full
active_mask.

Comment on lines +91 to +100
/// Panics if `D * tau` exceeds `EMBED_BUFFER_CAPACITY`. This is a
/// programming error, not a runtime failure.
pub fn new(tau: usize) -> Self {
let tau = if tau == 0 { DEFAULT_TAU } else { tau };
assert!(
D * tau <= EMBED_BUFFER_CAPACITY,
"TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})",
D * tau,
EMBED_BUFFER_CAPACITY
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Embed readiness threshold is too strict for tau > 1.

Line 121 uses D * tau, but the embedding definition in Line 74 only needs samples through x(t-(D-1)tau), i.e. (D - 1) * tau + 1. This delays valid embeddings and over-restricts constructor validation.

Proposed fix
-        assert!(
-            D * tau <= EMBED_BUFFER_CAPACITY,
-            "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})",
-            D * tau,
-            EMBED_BUFFER_CAPACITY
-        );
+        let required = if D == 0 {
+            0
+        } else {
+            (D - 1).saturating_mul(tau).saturating_add(1)
+        };
+        assert!(
+            required <= EMBED_BUFFER_CAPACITY,
+            "TimeDelayEmbedder required samples ({}) exceed EMBED_BUFFER_CAPACITY ({})",
+            required,
+            EMBED_BUFFER_CAPACITY
+        );
@@
-        let required = D * self.tau;
+        let required = if D == 0 {
+            0
+        } else {
+            (D - 1).saturating_mul(self.tau).saturating_add(1)
+        };
         if self.buffer_len < required {
             return None;
         }

Also applies to: 121-123

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/manifold.rs` around lines 91 - 100, The
assertion in the `new` method uses `D * tau` as the validation threshold, but
the actual embedding definition only requires `(D - 1) * tau + 1` samples.
Update the assertion logic to use the correct formula `(D - 1) * tau + 1`
instead of `D * tau`, and also apply the same correction to the embed readiness
threshold check around lines 121-123 to ensure consistency and avoid
unnecessarily delaying valid embeddings.

Comment thread crates/adaptive-topology/src/manifold.rs Outdated
Comment thread crates/adaptive/src/acg_learner.rs
Comment thread python/nemo_relay/_native.pyi Outdated
@teerthsharma teerthsharma force-pushed the POC/aether-topological-adaptive-runtime branch from da2470a to 17488c8 Compare June 18, 2026 06:35
Signed-off-by: teerth sharma <teerths57@gmail.com>
@teerthsharma teerthsharma force-pushed the POC/aether-topological-adaptive-runtime branch from 17488c8 to 24ed6f2 Compare June 18, 2026 06:36

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/adaptive/tests/unit/storage_tests.rs (1)

243-253: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Assert the new converged field in the stability round-trip test.

Line 244 loads the full stability record, but the test never checks converged. Add an explicit assertion so storage regressions on this field are caught.

Suggested diff
     assert_eq!(loaded_stability.stable_prefix_length, 1);
     assert_eq!(loaded_stability.total_observations, 3);
+    assert!(!loaded_stability.converged);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive/tests/unit/storage_tests.rs` around lines 243 - 253, The
stability round-trip test loads the full stability record into the
loaded_stability variable but does not assert the converged field, which could
allow storage regressions on this field to go undetected. Add an explicit
assertion immediately after the existing stability assertions (after the
assertion for loaded_stability.total_observations) to verify the converged field
has the expected value, ensuring the field is properly persisted and retrieved
during the round-trip.
♻️ Duplicate comments (2)
crates/adaptive-topology/src/manifold.rs (2)

91-100: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the mathematically correct embedding readiness threshold.

Line 91 and Line 121 use D * tau, but the embedding in Line 74 ([x(t), x(t-tau), ..., x(t-(D-1)tau)]) only requires (D - 1) * tau + 1 samples. Current logic rejects valid embeddings for tau > 1.

Proposed fix
     pub fn new(tau: usize) -> Self {
         let tau = if tau == 0 { DEFAULT_TAU } else { tau };
+        let required = if D == 0 {
+            0
+        } else {
+            (D - 1).saturating_mul(tau).saturating_add(1)
+        };
         assert!(
-            D * tau <= EMBED_BUFFER_CAPACITY,
-            "TimeDelayEmbedder D * tau ({}) exceeds EMBED_BUFFER_CAPACITY ({})",
-            D * tau,
+            required <= EMBED_BUFFER_CAPACITY,
+            "TimeDelayEmbedder required samples ({}) exceed EMBED_BUFFER_CAPACITY ({})",
+            required,
             EMBED_BUFFER_CAPACITY
         );
@@
     pub fn embed(&self) -> Option<ManifoldPoint<D>> {
-        let required = D * self.tau;
+        let required = if D == 0 {
+            0
+        } else {
+            (D - 1).saturating_mul(self.tau).saturating_add(1)
+        };
         if self.buffer_len < required {
             return None;
         }

Also applies to: 121-123

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/manifold.rs` around lines 91 - 100, The
embedding readiness threshold is mathematically incorrect. The embedding pattern
requires (D - 1) * tau + 1 samples, not D * tau. Update the assertion in the new
method (where tau is validated) to use (D - 1) * tau + 1 instead of D * tau.
Also apply the same fix to line 121-123 which has the same incorrect threshold
check. This correction will allow valid embeddings to proceed instead of
rejecting them unnecessarily when tau is greater than 1.

225-246: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark vertices visited at push-time in DFS to avoid duplicate stack entries.

In Line 239-Line 246, neighbors are marked visited only after pop. That allows duplicate pushes for the same vertex, which can consume the fixed stack and skip reachable vertices, producing incorrect β₀.

Proposed fix
             components += 1;
             let mut stack = [0usize; MAX_GRAPH_POINTS];
             let mut stack_top = 1;
             stack[0] = start;
+            visited[start] = true;

             while stack_top > 0 {
                 stack_top -= 1;
                 let current = stack[stack_top];

-                if visited[current] {
-                    continue;
-                }
-                visited[current] = true;
-
-                for (neighbor, is_visited) in visited.iter().enumerate().take(self.point_count) {
-                    if !is_visited
-                        && self.are_neighbors(current, neighbor)
-                        && stack_top < MAX_GRAPH_POINTS
-                    {
+                for neighbor in 0..self.point_count {
+                    if !visited[neighbor] && self.are_neighbors(current, neighbor) {
+                        visited[neighbor] = true;
                         stack[stack_top] = neighbor;
                         stack_top += 1;
                     }
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/manifold.rs` around lines 225 - 246, The DFS
traversal marks vertices as visited only after they are popped from the stack
(when visited[current] is set to true), which allows the same vertex to be
pushed onto the stack multiple times before it gets processed. This causes
duplicate stack entries that can overflow the fixed-size stack and skip
reachable vertices, resulting in incorrect component counting. Move the visited
marking to happen when pushing neighbors onto the stack rather than when popping
them. In the neighbor iteration loop where neighbors are added to the stack
(around lines 239-246), set visited[neighbor] = true immediately before pushing
the neighbor onto the stack to prevent duplicate pushes of the same vertex.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/adaptive-topology/README.md`:
- Around line 33-35: The code example block in the README.md file starts
immediately after the "## Example" heading without a complete-sentence
introduction. Add a descriptive sentence between the heading and the opening
triple backticks that explains what the code example demonstrates, following the
documentation style guideline that requires every code block to be introduced
with a complete sentence.

In `@crates/adaptive-topology/src/convergence.rs`:
- Around line 237-239: In the betti_score calculation within the
convergence_score method, the variations count is being normalized by
self.stability_window when it should be normalized by the actual number of
transitions. Since variations are counted from window[..count].windows(2), which
produces count - 1 adjacent pairs, change the denominator in the betti_score
assignment from self.stability_window to (count - 1) to correctly normalize the
variation ratio.

In `@crates/adaptive-topology/src/drift.rs`:
- Around line 40-60: Add a validation check at the beginning of the update
method to ensure all values in the centroid array are finite (not NaN or Inf).
If any non-finite value is detected, return early with a drift value of 0.0 to
prevent non-finite values from being written into the internal state fields
(previous, expected, velocity). This guards against NaN values persisting in the
drift and velocity calculations, which would disable meaningful drift checks.

In `@crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs`:
- Around line 361-362: The current assertion only verifies that the agent hints
header is absent from the request, but it does not verify that the agent hints
data is also absent from the request body. To harden the test, add a negative
assertion that checks the request body does not contain the nvext.agent_hints
field after the existing header check on line 361. This ensures that if a
regression accidentally injects nvext.agent_hints in the body while skipping the
header, the test will catch it and fail as expected.

In `@crates/adaptive/tests/unit/runtime_tests.rs`:
- Around line 322-375: The test
validate_config_reports_invalid_topology_numeric_fields should verify
field-level diagnostics for all invalid fields within components, not just
component-level diagnostics. Currently it checks for component paths like
"adaptive_hints.governor" and "acg.convergence", but it should also assert that
diagnostics exist for specific invalid fields within those components. For the
acg.convergence and convergence components which each have multiple invalid
fields (epsilon and stability_window), add additional assertions to the
report.diagnostics iteration to verify that field-level diagnostics are reported
for paths like "acg.convergence.epsilon", "acg.convergence.stability_window",
"convergence.epsilon", and "convergence.stability_window" to ensure all invalid
numeric fields are being validated and prevent silent validator regressions.

In `@docs/adaptive-plugin/about.mdx`:
- Around line 39-40: The documentation uses inconsistent terminology for the
same concept: "topology-aware" appears in one location while "topology-inspired"
appears in another location (in the line mentioning "topology-inspired
signals"). Standardize on a single term throughout the document by identifying
all instances of both "topology-aware" and "topology-inspired" and replacing
them with one consistent term. Ensure the chosen term is applied uniformly
across the entire file to maintain clarity and avoid ambiguity for readers.

In `@docs/adaptive-plugin/configuration.mdx`:
- Around line 221-233: The Rust example for tool_parallelism configuration uses
ToolParallelismComponentConfig::default() implicitly, while the TOML, Python,
and Node examples all explicitly demonstrate the drift configuration. Update the
line setting adaptive.tool_parallelism to explicitly define the drift field
configuration instead of relying on the default() method, ensuring the Rust
example maintains parity with the other language examples in the documentation.

---

Outside diff comments:
In `@crates/adaptive/tests/unit/storage_tests.rs`:
- Around line 243-253: The stability round-trip test loads the full stability
record into the loaded_stability variable but does not assert the converged
field, which could allow storage regressions on this field to go undetected. Add
an explicit assertion immediately after the existing stability assertions (after
the assertion for loaded_stability.total_observations) to verify the converged
field has the expected value, ensuring the field is properly persisted and
retrieved during the round-trip.

---

Duplicate comments:
In `@crates/adaptive-topology/src/manifold.rs`:
- Around line 91-100: The embedding readiness threshold is mathematically
incorrect. The embedding pattern requires (D - 1) * tau + 1 samples, not D *
tau. Update the assertion in the new method (where tau is validated) to use (D -
1) * tau + 1 instead of D * tau. Also apply the same fix to line 121-123 which
has the same incorrect threshold check. This correction will allow valid
embeddings to proceed instead of rejecting them unnecessarily when tau is
greater than 1.
- Around line 225-246: The DFS traversal marks vertices as visited only after
they are popped from the stack (when visited[current] is set to true), which
allows the same vertex to be pushed onto the stack multiple times before it gets
processed. This causes duplicate stack entries that can overflow the fixed-size
stack and skip reachable vertices, resulting in incorrect component counting.
Move the visited marking to happen when pushing neighbors onto the stack rather
than when popping them. In the neighbor iteration loop where neighbors are added
to the stack (around lines 239-246), set visited[neighbor] = true immediately
before pushing the neighbor onto the stack to prevent duplicate pushes of the
same vertex.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ccb1a467-92d2-498a-ae76-c078c7204100

📥 Commits

Reviewing files that changed from the base of the PR and between da2470a and 24ed6f2.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (66)
  • .pre-commit-config.yaml
  • ATTRIBUTIONS-Rust.md
  • Cargo.toml
  • crates/adaptive-topology/Cargo.toml
  • crates/adaptive-topology/README.md
  • crates/adaptive-topology/src/convergence.rs
  • crates/adaptive-topology/src/drift.rs
  • crates/adaptive-topology/src/geometry.rs
  • crates/adaptive-topology/src/governor.rs
  • crates/adaptive-topology/src/lib.rs
  • crates/adaptive-topology/src/manifold.rs
  • crates/adaptive-topology/src/topology.rs
  • crates/adaptive/Cargo.toml
  • crates/adaptive/README.md
  • crates/adaptive/benches/convergence_bench.rs
  • crates/adaptive/src/acg/stability.rs
  • crates/adaptive/src/acg_learner.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/config.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/plugin_component.rs
  • crates/adaptive/src/runtime/features.rs
  • crates/adaptive/src/runtime/validation.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/tests/integration/runtime_integration_tests.rs
  • crates/adaptive/tests/integration/tool_parallelism_plan_tests.rs
  • crates/adaptive/tests/integration/topology_convergence_tests.rs
  • crates/adaptive/tests/unit/acg/economics_internal_tests.rs
  • crates/adaptive/tests/unit/acg/economics_policy_tests.rs
  • crates/adaptive/tests/unit/acg/multi_breakpoint_tests.rs
  • crates/adaptive/tests/unit/acg_component_tests.rs
  • crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs
  • crates/adaptive/tests/unit/cache_diagnostics_tests.rs
  • crates/adaptive/tests/unit/config_tests.rs
  • crates/adaptive/tests/unit/intercepts_tests.rs
  • crates/adaptive/tests/unit/plugin_component_tests.rs
  • crates/adaptive/tests/unit/runtime_features_tests.rs
  • crates/adaptive/tests/unit/runtime_tests.rs
  • crates/adaptive/tests/unit/storage_memory_internal_tests.rs
  • crates/adaptive/tests/unit/storage_tests.rs
  • crates/adaptive/tests/unit/tool_parallelism_learner_tests.rs
  • crates/adaptive/tests/unit/types_tests.rs
  • crates/node/adaptive.d.ts
  • crates/node/adaptive.js
  • crates/node/tests/adaptive_tests.mjs
  • crates/python/Cargo.toml
  • crates/python/src/lib.rs
  • crates/python/src/py_adaptive_topology.rs
  • crates/python/tests/coverage/py_storage_coverage_tests.rs
  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/adaptive-hints.mdx
  • docs/adaptive-plugin/configuration.mdx
  • go/nemo_relay/adaptive.go
  • go/nemo_relay/adaptive/adaptive.go
  • go/nemo_relay/adaptive_test.go
  • python/nemo_relay/__init__.py
  • python/nemo_relay/__init__.pyi
  • python/nemo_relay/_native.pyi
  • python/nemo_relay/adaptive.py
  • python/nemo_relay/adaptive.pyi
  • python/nemo_relay/adaptive_topology.py
  • python/nemo_relay/adaptive_topology.pyi
  • python/tests/test_adaptive.py
  • python/tests/test_adaptive_config.py
  • python/tests/test_adaptive_topology.py

Comment thread crates/adaptive-topology/README.md Outdated
Comment on lines +237 to +239
let variations = window[..count].windows(2).filter(|w| w[0] != w[1]).count();
let betti_score = 1.0 - (variations as f64 / self.stability_window as f64);
score += BETTI_STABILITY_WEIGHT * betti_score;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize Betti variation by transition count, not window length.

Line [238] divides by self.stability_window, but variations are counted across adjacent pairs (count - 1 transitions). This systematically overstates Betti stability in convergence_score.

Suggested fix
 let variations = window[..count].windows(2).filter(|w| w[0] != w[1]).count();
-let betti_score = 1.0 - (variations as f64 / self.stability_window as f64);
+let transitions = count.saturating_sub(1).max(1);
+let betti_score = 1.0 - (variations as f64 / transitions as f64);
 score += BETTI_STABILITY_WEIGHT * betti_score;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/convergence.rs` around lines 237 - 239, In the
betti_score calculation within the convergence_score method, the variations
count is being normalized by self.stability_window when it should be normalized
by the actual number of transitions. Since variations are counted from
window[..count].windows(2), which produces count - 1 adjacent pairs, change the
denominator in the betti_score assignment from self.stability_window to (count -
1) to correctly normalize the variation ratio.

Comment thread crates/adaptive-topology/src/drift.rs Outdated
Comment on lines +40 to +60
pub fn update(&mut self, centroid: &[f64; D]) -> f64 {
let drift = if self.has_previous {
l2_distance(&self.expected, centroid)
} else {
0.0
};

if self.has_previous {
for (d, vel) in self.velocity.iter_mut().enumerate().take(D) {
*vel = centroid[d] - self.previous[d];
}
}

for (d, exp) in self.expected.iter_mut().enumerate().take(D) {
*exp = centroid[d] + self.velocity[d];
}

self.previous = *centroid;
self.has_previous = true;

drift

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard update against non-finite centroids to prevent persistent NaN state.

At Line [40], non-finite centroid values are accepted and then written into previous/expected/velocity. After that, drift and velocity calculations can stay NaN and effectively disable meaningful drift checks.

Suggested fix
 pub fn update(&mut self, centroid: &[f64; D]) -> f64 {
+    if centroid.iter().any(|coord| !coord.is_finite()) {
+        self.reset();
+        return f64::INFINITY;
+    }
+
     let drift = if self.has_previous {
         l2_distance(&self.expected, centroid)
     } else {
         0.0
     };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive-topology/src/drift.rs` around lines 40 - 60, Add a validation
check at the beginning of the update method to ensure all values in the centroid
array are finite (not NaN or Inf). If any non-finite value is detected, return
early with a drift value of 0.0 to prevent non-finite values from being written
into the internal state fields (previous, expected, velocity). This guards
against NaN values persisting in the drift and velocity calculations, which
would disable meaningful drift checks.

Comment thread crates/adaptive/tests/unit/adaptive_hints_intercept_tests.rs
Comment thread crates/adaptive/tests/unit/runtime_tests.rs
Comment thread docs/adaptive-plugin/about.mdx Outdated
Comment thread docs/adaptive-plugin/configuration.mdx Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Teerth Sharma <teerths57@gmail.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/adaptive-plugin/about.mdx`:
- Around line 39-40: The file docs/adaptive-plugin/about.mdx is missing the
required SPDX license header at the top. Add the SPDX header using JSX comment
delimiters {/* ... */} at the very beginning of the file before the YAML
frontmatter. The header should include the SPDX-FileCopyrightText line with the
copyright notice and the SPDX-License-Identifier line set to Apache-2.0, wrapped
in the JSX comment syntax as specified in the review comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 289cbccd-452e-4a63-9da2-9d593edfa1a4

📥 Commits

Reviewing files that changed from the base of the PR and between 24ed6f2 and 6fc3d5e.

📒 Files selected for processing (1)
  • docs/adaptive-plugin/about.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (12)
{docs/**,README.md,CONTRIBUTING.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed
Run docs site build with just docs

Files:

  • docs/adaptive-plugin/about.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Run docs link validation with just docs-linkcheck when links change

Files:

  • docs/adaptive-plugin/about.mdx
{docs/**,README.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Verify README and docs entry points still match current package names and paths for large or public-facing changes

Files:

  • docs/adaptive-plugin/about.mdx
{docs/**,examples/**,README.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Verify examples still run with documented commands for large or public-facing changes

Files:

  • docs/adaptive-plugin/about.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes

Files:

  • docs/adaptive-plugin/about.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current

Files:

  • docs/adaptive-plugin/about.mdx
**/*.mdx

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.

MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)

Files:

  • docs/adaptive-plugin/about.mdx
**/*.{html,md,mdx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license header in HTML and Markdown files using HTML comment syntax

Files:

  • docs/adaptive-plugin/about.mdx
docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed

Files:

  • docs/adaptive-plugin/about.mdx
docs/**

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run just docs or ./scripts/build-docs.sh html to regenerate ignored Fern API reference pages before validation for documentation site changes

Files:

  • docs/adaptive-plugin/about.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

  • docs/adaptive-plugin/about.mdx
**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

  1. Scope stacks decide where work belongs and which scope-local behavior is visible.
  2. Middleware registries decide what guardrails and intercepts run around managed calls.
  3. Plugins install reusable runtime behavior from configuration.
  4. Events record runtime behavior in ATOF form.
  5. Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.

crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
third_party/  # P...

Files:

  • docs/adaptive-plugin/about.mdx
🔇 Additional comments (2)
docs/adaptive-plugin/about.mdx (2)

39-40: Terminology consistency already addressed.

The past review flagged inconsistency between "topology-aware" and "topology-inspired" at line 39. The current code correctly uses "topology-aware signals stabilize," which aligns with terminology used earlier in the document and throughout the adaptive plugin docs. This concern has been resolved.


39-40: New documentation bullets are technically accurate and well-motivated.

  • Line 39: "Stop adaptive learning once observed topology-aware signals stabilize" directly reflects convergence detection via Betti-number stability (convergence detector in PR).
  • Line 40: "Invalidate stale tool parallelism plans when cohort shape drifts" aligns with drift-based plan invalidation behavior implemented in tool-parallelism learner.

Both bullets are clear, actionable, and grounded in the topology-aware adaptive feature set.

Comment thread docs/adaptive-plugin/about.mdx

@willkill07 willkill07 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a design document and clear measurable and observable improvements, I am extremely reluctant to spend the time to review this PR.

Additionally:

  • no tests should be embedded in source files
  • no new crates should be added if the behavior is internal only (which it sounds like adaptive-topology is

Signed-off-by: Teerth Sharma <teerths57@gmail.com>
@teerthsharma

teerthsharma commented Jun 22, 2026

Copy link
Copy Markdown
Author

Addressed the review concerns in 0f56103822edad153910d83fa463dafcae13a9c1, ad6b6d06920ddb98bdaec69902ad33ef07cc3496, and 1036c12d8d5ae3b802fe29d39b70465bacf458ef:

  • added the reviewer-facing design doc at docs/adaptive-plugin/topology-aware-controls-design.mdx with measurable outcomes and observable signals
  • folded the topology controls into nemo-relay-adaptive instead of adding a new crate
  • removed the proposed public Python topology primitive surface
  • moved the new primitive tests out of source files into crates/adaptive/tests/unit/topology_tests.rs
  • hardened the hint-governor test to assert both header and body omission when hints are shed
  • applied the MDX SPDX placement requested by the review bot for docs/adaptive-plugin/about.mdx
  • tightened internal convergence so it requires a full stable window across Betti signatures, finite non-increasing drift, and low error; also clamps the stability window and resets drift state on non-finite centroids

Validation run locally:

  • cargo fmt --all
  • cargo test -p nemo-relay-adaptive topology -- --nocapture
  • cargo test -p nemo-relay-adaptive --test topology_convergence -- --nocapture
  • cargo test -p nemo-relay-adaptive --test tool_parallelism_plan -- --nocapture
  • cargo test -p nemo-relay-adaptive adaptive_hints_governor -- --nocapture
  • cargo test -p nemo-relay-adaptive --lib (484 tests after the convergence fix)
  • cargo check -p nemo-relay-python
  • cargo clippy -p nemo-relay-adaptive --all-targets -- -D warnings
  • cd fern && npx fern check --warnings (passes with the expected unauthenticated redirects warning)

uv run pytest python/tests/test_adaptive.py python/tests/test_adaptive_config.py -q is still blocked locally before collection by .venv\lib64 access denied, and just docs is blocked in this Windows shell because the recipe cannot find cygpath. The direct Fern check above passed.

Signed-off-by: Teerth Sharma <teerths57@gmail.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/adaptive/src/tool_parallelism_learner.rs (1)

73-75: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Drift invalidation should use an inclusive threshold.

At Line 74, using > misses invalidation when drift is exactly at threshold.

Suggested fix
-        Ok(drift > config.threshold)
+        Ok(drift >= config.threshold)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive/src/tool_parallelism_learner.rs` around lines 73 - 75, In the
return statement within the method that updates drift using
detector.update(&centroid), change the comparison operator from > to >= when
comparing drift against config.threshold. This ensures that invalidation occurs
when drift equals the threshold value exactly, not just when it exceeds it.
Update the condition Ok(drift > config.threshold) to use the inclusive >=
operator instead.
♻️ Duplicate comments (1)
crates/adaptive/src/acg_learner.rs (1)

186-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persisting stability before observations can strand profiles in a false converged state.

At Line 227, converged stability is stored before observations (Line 235). If observation storage fails, the next run can hit the converged fast-path at Line 186 and skip repair permanently.

Suggested fix
-                if existing_stability
+                if existing_stability
                     .as_ref()
                     .map(|stability| stability.converged)
                     .unwrap_or(false)
+                    && existing.is_some()
                 {
@@
-                if self.record_stability_epoch(&profile_key, &stability_result)? {
-                    stability_result.converged = true;
-                }
-
-                backend
-                    .store_stability(&profile_key, &stability_result)
-                    .await?;
+                let converged_now = self.record_stability_epoch(&profile_key, &stability_result)?;
@@
                 backend
                     .store_observations(&profile_key, &observations_vec)
                     .await?;
+
+                if converged_now {
+                    stability_result.converged = true;
+                }
+                backend
+                    .store_stability(&profile_key, &stability_result)
+                    .await?;

Also applies to: 226-236

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/adaptive/src/acg_learner.rs` around lines 186 - 206, The issue is that
stability state is being persisted before observations are persisted, creating a
race condition where if observation persistence fails, the converged flag in
stability remains true, causing the fast-path at the beginning of the function
(checking if existing_stability is converged) to incorrectly skip repair on the
next run. Fix this by reversing the order of persistence operations so that
observations are stored first before stability is persisted, ensuring that if
observation storage fails, the converged state remains false and repair will be
attempted on the next execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/adaptive/src/topology.rs`:
- Around line 135-147: The convergence check in the is_converged method
currently uses a short-circuit OR operator that can declare convergence based on
insufficient evidence, and the is_error_converged method only checks if the last
single error value is below epsilon rather than verifying stability over a
configured window. To fix this, modify is_error_converged to check that the last
N error samples in error_history are all consistently below epsilon threshold to
guarantee convergence over a window period, rather than checking only the latest
point. Additionally, change the logic operator in is_converged from OR to AND to
ensure that error convergence, betti stability, and drift decrease all must be
satisfied together before declaring the topology converged.

In `@crates/adaptive/tests/unit/topology_tests.rs`:
- Around line 27-33: The test function error_below_epsilon_converges expects
convergence after only a single low-error epoch, which may not align with the
intended window-based stability check. If the ConvergenceDetector requires a
stability window (indicated by the 3 parameter in ConvergenceDetector::new(0.01,
3)), then the test needs to record multiple consecutive epochs with low error
values before asserting is_converged() returns true. Either call record_epoch
multiple times with the same low-error pattern to fill the window, or change the
assertion to expect false convergence after a single epoch to properly test
window-based behavior.

---

Outside diff comments:
In `@crates/adaptive/src/tool_parallelism_learner.rs`:
- Around line 73-75: In the return statement within the method that updates
drift using detector.update(&centroid), change the comparison operator from > to
>= when comparing drift against config.threshold. This ensures that invalidation
occurs when drift equals the threshold value exactly, not just when it exceeds
it. Update the condition Ok(drift > config.threshold) to use the inclusive >=
operator instead.

---

Duplicate comments:
In `@crates/adaptive/src/acg_learner.rs`:
- Around line 186-206: The issue is that stability state is being persisted
before observations are persisted, creating a race condition where if
observation persistence fails, the converged flag in stability remains true,
causing the fast-path at the beginning of the function (checking if
existing_stability is converged) to incorrectly skip repair on the next run. Fix
this by reversing the order of persistence operations so that observations are
stored first before stability is persisted, ensuring that if observation storage
fails, the converged state remains false and repair will be attempted on the
next execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 2c6e25e2-9b5b-49f7-8fac-0fcddc8494b7

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc3d5e and 0f56103.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • crates/adaptive/Cargo.toml
  • crates/adaptive/src/acg_learner.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/tests/unit/topology_tests.rs
  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
💤 Files with no reviewable changes (1)
  • crates/adaptive/Cargo.toml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (26)
{docs/**,README.md,CONTRIBUTING.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

{docs/**,README.md,CONTRIBUTING.md}: For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed
Run docs site build with just docs

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
{docs/**,README.md,CONTRIBUTING.md,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Run docs link validation with just docs-linkcheck when links change

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
{docs/**,README.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Verify README and docs entry points still match current package names and paths for large or public-facing changes

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
{docs/**,examples/**,README.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Verify examples still run with documented commands for large or public-facing changes

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
**/*.{md,mdx,py,sh,yaml,yml,toml,json}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
**/*.mdx

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

In MDX files, top-of-file comments must use JSX comment delimiters: {/* to open and */} to close. Do not use HTML comments for MDX SPDX headers.

MDX top-of-file SPDX comments must use {/* ... */} delimiters instead of HTML comment delimiters (Must-Fix)

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
**/*.{html,md,mdx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license header in HTML and Markdown files using HTML comment syntax

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update embedded documentation snippets, patch docs, and binding-support notes if examples or supported bindings changed

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
docs/**

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run just docs or ./scripts/build-docs.sh html to regenerate ignored Fern API reference pages before validation for documentation site changes

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

  1. Scope stacks decide where work belongs and which scope-local behavior is visible.
  2. Middleware registries decide what guardrails and intercepts run around managed calls.
  3. Plugins install reusable runtime behavior from configuration.
  4. Events record runtime behavior in ATOF form.
  5. Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.

crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
third_party/  # P...

Files:

  • docs/adaptive-plugin/about.mdx
  • docs/adaptive-plugin/topology-aware-controls-design.mdx
  • crates/adaptive/tests/unit/topology_tests.rs
  • docs/adaptive-plugin/acg.mdx
  • docs/adaptive-plugin/configuration.mdx
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Use cargo fmt for Rust code formatting
Run cargo clippy -- -D warnings to lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code with uv run pre-commit run --all-files to enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/adaptive.py,python/nemo_relay/plugin.py,go/nemo_relay/adaptive/**,go/nemo_relay/!(adaptive)/**,**/node/**,**/wasm/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Keep adaptive surface in sync across crates/adaptive, shared plugin behavior in core and bindings, Python adaptive/plugin wrappers in python/nemo_relay/adaptive.py and python/nemo_relay/plugin.py, Go adaptive helpers under go/nemo_relay/adaptive plus shared plugin helpers in go/nemo_relay, and Node/WebAssembly adaptive helpers and plugin wrappers

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

{crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**}: Maintain consistent plugin lifecycle across all language bindings (Python, Go, Node/WebAssembly, and Rust)
Keep plugin context surfaces aligned across all language implementations

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
{crates/core,crates/adaptive}/**/*

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changes to crates/core or crates/adaptive must run the full language matrix

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

If crates/core or crates/adaptive changed, run the full matrix across Rust, Python, Go, Node.js, and WebAssembly

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Follow binding naming conventions: Rust and Python use snake_case, C FFI exports prefixed nemo_relay_, Go uses PascalCase for public APIs, Node.js uses camelCase.

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
Use Json = serde_json::Value in Rust-facing runtime APIs for JSON payload handling.

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
crates/{core,adaptive}/**/*.rs

⚙️ CodeRabbit configuration file

crates/{core,adaptive}/**/*.rs: Review the Rust runtime for async correctness, scope isolation, middleware ordering, and event lifecycle regressions.
Pay close attention to task-local/thread-local scope propagation, callback lifetimes, stream finalization, and root_uuid isolation.
Public API changes should preserve existing behavior unless tests and docs show the intended migration path.

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
  • crates/adaptive/src/topology.rs
  • crates/adaptive/src/adaptive_hints_intercept.rs
  • crates/adaptive/src/tool_parallelism_learner.rs
  • crates/adaptive/src/lib.rs
  • crates/adaptive/src/acg_learner.rs
{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

  • crates/adaptive/tests/unit/topology_tests.rs
🔇 Additional comments (7)
docs/adaptive-plugin/topology-aware-controls-design.mdx (1)

1-83: LGTM!

docs/adaptive-plugin/configuration.mdx (2)

202-235: Rust example now properly demonstrates topology-aware configuration.

The Rust example imports are extended with ConvergenceConfig and DriftConfig (line 205), and the initialization now explicitly sets tool_parallelism.drift, adaptive_hints.governor, acg.convergence, and top-level convergence fields (lines 221-235). This matches the Python and Node.js examples' completeness and addresses the prior guidance about explicit cross-language parity.


249-268: TOML example already includes all topology-aware control blocks with correct defaults and cross-language parity.

The TOML configuration snippet (lines 74-110) contains all three topology-aware control sections documented in the field table: [components.config.tool_parallelism.drift], [components.config.adaptive_hints.governor], and [components.config.acg.convergence]. Field names and default values align with the Python, Node.js, and Rust examples. No changes needed.

docs/adaptive-plugin/acg.mdx (1)

224-238: No changes needed; convergence validation rules are correctly documented in the "Common Validation Failures" section.

docs/adaptive-plugin/about.mdx (1)

1-64: SPDX header is correctly formatted.

The MDX file includes the required SPDX header with proper JSX comment delimiters {/* ... */} after the YAML frontmatter. No action needed.

crates/adaptive/src/adaptive_hints_intercept.rs (1)

137-150: LGTM!

Also applies to: 183-196, 223-227, 248-258

crates/adaptive/src/lib.rs (1)

34-34: Please confirm the required Rust validation commands were run for this change set.

I only see explicit test counts in the PR context; please confirm just test-rust, cargo fmt --all, and cargo clippy --workspace --all-targets -- -D warnings were executed successfully.

As per coding guidelines, "If any Rust code changed, always run just test-rust; also run cargo fmt --all; also run cargo clippy --workspace --all-targets -- -D warnings."

Source: Coding guidelines

Comment thread crates/adaptive/src/topology.rs Outdated
Comment thread crates/adaptive/tests/unit/topology_tests.rs Outdated
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
@teerthsharma

teerthsharma commented Jun 22, 2026

Copy link
Copy Markdown
Author

@willkill07 @lvojtku ready for re-review when you have bandwidth.

The requested changes have been addressed on the current head 131ffbc60948a0bcc2d8b8d813c8283f7eeb48d6:

  • design note moved out of published Fern docs to .github/designs/topology-aware-adaptive-controls.md
  • public Adaptive docs no longer link to or publish the internal design note
  • the internal design note now frames the feature by benefit gates, not capability: fewer ACG observations before convergence, stale tool-plan invalidation after cohort drift, and low-value learned hint omission while preserving manual overrides
  • sample evidence includes ACG convergence (observations-to-decision: without=50, with=3), tool-drift plan invalidation, and hint-governor shed/override behavior
  • internal topology controls remain folded into nemo-relay-adaptive; no new adaptive-topology crate remains
  • tests remain outside source files; public Python topology primitive surface remains removed

Validation for the latest docs move: cd fern && npx fern check --warnings passes with only the expected unauthenticated redirects warning.

@willkill07 willkill07 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the design doc isn't something that should be published as documentation.

I'm still trying to understand the benefit besides "just because we can"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a design document showing benefits. You state measurable Outcomes but do not provide any samples to demonstrate usefulness.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 012a96643d566113e7d53c00505638c611ea07e6 by adding a Sample Evidence section to this design doc.

It now includes concrete samples for each control:

  • ACG convergence: benchmark fixture with observations-to-decision: without=50, with=3
  • tool drift: sample cohort centroid movement from [1, 2, 0.0, 2] to [2, 3, 0.4, 3], showing stale plan invalidation
  • hint governor: sample learned hint shed at latency_sensitivity = 2.0 under epsilon = 10.0, while manual override at 11.0 still emits hints

I also ran cd fern && npx fern check --warnings; it passes with only the expected unauthenticated redirects warning.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 131ffbc60948a0bcc2d8b8d813c8283f7eeb48d6.

Changes made:

  • moved the design note out of the published Fern docs tree to .github/designs/topology-aware-adaptive-controls.md
  • removed the public docs link/page from docs/adaptive-plugin/about.mdx and docs/adaptive-plugin/configuration.mdx
  • rewrote the design note around benefit gates rather than capability: each control now has a concrete gate for when it should stay disabled versus when it is useful
  • kept the sample evidence in the internal note, including the benchmark sample observations-to-decision: without=50, with=3

Docs validation still passes with cd fern && npx fern check --warnings.

@teerthsharma teerthsharma Jun 22, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it will give measure samples as many i can on my local device

Signed-off-by: Teerth Sharma <teerths57@gmail.com>
Signed-off-by: Teerth Sharma <teerths57@gmail.com>
Signed-off-by: teerth sharma <teerths57@gmail.com>
@willkill07

willkill07 commented Jun 22, 2026

Copy link
Copy Markdown
Member

@teerthsharma @KarmicDaoist Let's try this again:

  1. Please use a single account. Seeing a secondary account not only introduces confusion but also adds valid concerns about developer authenticity.
  2. If you are using GenAI to yield all of this, be up front about it. I've noticed similar PRs and issues created on other repositories. I do NOT want to merge something that has not been properly vetted, tested, and analyzed.
  3. Don't ping me (or anyone else) to review if any one of the following is true:
    a). you still intend to add comments to support your PR
    b). you still intend to update the PR with follow-up commits addressing bugs
    c). your PR is not "complete" as-is.

The branch name used for the PR includes the name POC -- I'm not sure this should be merged into NeMo Relay given your own naming conventions.

@willkill07 willkill07 marked this pull request as draft June 22, 2026 17:13
@willkill07 willkill07 added the DO NOT MERGE PR should not be merged; see PR for details label Jun 22, 2026
Signed-off-by: teerth sharma <teerths57@gmail.com>
@teerthsharma

teerthsharma commented Jun 22, 2026

Copy link
Copy Markdown
Author

Thanks for the direct feedback. I understand the concern.

I will use only this GitHub account (teerthsharma) for this PR going forward. The earlier secondary-account interaction was my mistake, and I apologize for the confusion it caused.

I also want to be clear about GenAI usage: I used AI assistance for drafting, test planning, and review passes, but I personally own the PR, the validation, and the decision to submit it. I am not asking you to merge unvetted generated code.

The core architecture idea (topology) behind this PR comes from my open research work, arXiv:2604.19792. The Relay implementation here is not presented as research ornamentation; it is a guarded runtime hygiene layer:

  • ACG convergence stops re-observing stable prompt/cache structure once deterministic convergence is reached.
  • Tool parallelism drift invalidates stale fanout plans when observed cohort topology changes.
  • The hint governor sheds low-value learned metadata while preserving manual latency overrides.

The branch name containing POC was a naming mistake from my local workflow, not a statement that the PR is intended as an incomplete proof of concept. The code path is disabled by default and only activates under explicit config.

Since your comment, I added one final hardening commit (7cac143) that fixes convergence persistence ordering: ACG now stores the observations that produced a converged state before persisting converged=true, and it repairs stale or empty observation state instead of trusting cached convergence blindly.

Validation I ran locally:

  • cargo fmt --all
  • cargo test -p nemo-relay-adaptive -- --nocapture
  • just test-rust with RUST_TEST_THREADS=1
  • cargo clippy --workspace --all-targets -- -D warnings
  • Python tests: 383 passed
  • Node tests: 244 passed
  • Go binding tests: all packages passed using the freshly built GNU FFI library
  • just test-wasm: wasm-pack tests plus 80 JS package tests passed
  • pre-commit on changed files passed except the Windows lychee shell wrapper, which fails before checking links because of unescaped regex parentheses in /bin/bash -c

I do not intend to add more support comments or follow-up bugfix commits before review. This is my final review request for the current PR state.

@teerthsharma teerthsharma marked this pull request as ready for review June 22, 2026 17:39
@willkill07

Copy link
Copy Markdown
Member

@teerthsharma the ArXiV publication has exactly one in-passing mention of using betti numbers. The main contribution of your work is not in line with this open PR.

I need more substantial evidence to merge in ~5k lines of code that maintainers of Relay would have to up-keep and maintain.

Again, the benchmarks show it does fundamentally work, but I still do not see the measured cost benefit. Until that is shown where we can also justify the maintenance cost, I'm highly reluctant to merge this functionality.

@teerthsharma

teerthsharma commented Jun 22, 2026

Copy link
Copy Markdown
Author

To clarify my previous note, when I mentioned being "inspired" by the ArXiV publication, I meant it in the broader sense of leveraging topological functions for this architecture, rather than a direct 1:1 implementation of that arrangement as every repo and use cases are different rest my github have a lot of such examples which as you noticed I am applying everywhere
I completely understand the hesitation around absorbing the maintenance burden of ~5k lines of code without a definitive cost-to-benefit ratio. I am going to work on gathering more substantial evidence to demonstrate the concrete value here.
To ensure I'm focusing on the right areas, could you specify exactly what kind of evidence or specific benchmarking metrics would justify the upkeep for the Relay maintainers and give you the confidence to merge this?

@willkill07

Copy link
Copy Markdown
Member

To ensure I'm focusing on the right areas, could you specify exactly what kind of evidence or specific benchmarking metrics would justify the upkeep for the Relay maintainers and give you the confidence to merge this?

I'm really looking for an example that outperforms the current online adaptive config in some benchmark (PinchBench, SWEBench, etc). Measurement of success should be rooted in cost savings or measured e2e execution improvements. I understand that the latter may be harder to prove unless you have a dynamo-accelerated endpoint. and the former is only currently usable for anthropic cache control.

But we need to see some numbers with clear benefit. Synthetic benchmarks alone do not suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE PR should not be merged; see PR for details Feature a new feature lang:go PR changes/introduces Go code lang:js PR changes/introduces Javascript/Typescript code lang:python PR changes/introduces Python code lang:rust PR changes/introduces Rust code size:XXL PR is very large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants