Skip to content

refactor: move spamoor benchmark into testify suite#3107

Draft
chatton wants to merge 6 commits intomainfrom
cian/bench-refactor
Draft

refactor: move spamoor benchmark into testify suite#3107
chatton wants to merge 6 commits intomainfrom
cian/bench-refactor

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Feb 25, 2026

Overview

Introduces a benchmarking package within the e2e dir a testsuite we can add additional spamoor tests too.

Summary by CodeRabbit

  • Tests

    • Added a new end-to-end benchmark suite for spam-generation, tracing, and performance metrics with JSON output.
    • Introduced helpers to collect, parse and validate trace spans and benchmark metrics.
    • Removed an older duplicate smoke test.
  • Refactor

    • Promoted and renamed several test utilities and trace/metric helpers to public, improving reuse and reporting.
  • Chores

    • Updated CI workflow triggers and benchmark artifact paths/handling.

…mark

- Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite)
- Move spamoor smoke test into suite as TestSpamoorSmoke
- Split helpers into focused files: traces.go, output.go, metrics.go
- Introduce resultWriter for defer-based benchmark JSON output
- Export shared symbols from evm_test_common.go for cross-package use
- Restructure CI to fan-out benchmark jobs and fan-in publishing
- Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring:
- benchmark.yml: keep path filters and suite-style test command
- evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg)
- evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON
  (now in benchmark/output.go)
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Moves Spamoor benchmark logic into a new benchmark suite at test/e2e/benchmark/, adds evm-tagged helpers for metrics, traces, and JSON output, exports common EVM test helpers/types, updates CI workflow artifact paths, and removes the legacy evm_spamoor_smoke_test.go.

Changes

Cohort / File(s) Summary
CI Workflow
​.github/workflows/benchmark.yml
Updated PR trigger path filters; changed benchmark artifact upload/download and workspace paths to test/e2e/benchmark/; adjusted Spamoor smoke test invocation and EVMB binary path.
New Benchmark Suite
test/e2e/benchmark/suite_test.go, test/e2e/benchmark/spamoor_smoke_test.go
Adds SpamoorSuite with SetupTest and Smoke test orchestration (Jaeger, reth, sequencer, Spamoor), spammers, metrics collection, trace retrieval and assertions.
Benchmark Helpers
test/e2e/benchmark/metrics.go, test/e2e/benchmark/traces.go, test/e2e/benchmark/output.go
New evm build-tagged helpers: HTTP readiness polling, Prometheus counter summation, Jaeger trace parsing/adapters, span assertions, and JSON benchmark result writer.
E2E Common API
test/e2e/evm_test_common.go
Exported/renamed helpers and types: setup*Setup*, traceSpanTraceSpan, spanStatsSpanStats, aggregateSpanStatsAggregateSpanStats, printTraceReportPrintTraceReport; removed legacy benchmark JSON writer.
Test Callsite Updates
test/e2e/evm_contract_bench_test.go, test/e2e/evm_force_inclusion_e2e_test.go, test/e2e/evm_full_node_e2e_test.go, test/e2e/failover_e2e_test.go
Updated call sites to use exported helper names and capitalized trace interfaces/report functions.
Removed legacy test
test/e2e/evm_spamoor_smoke_test.go
Deleted legacy Spamoor smoke test and associated helpers (migrated into test/e2e/benchmark/).

Sequence Diagram(s)

sequenceDiagram
    participant Test as "Spamoor Smoke Test"
    participant Suite as "SpamoorSuite"
    participant Jaeger as "Jaeger (tracing)"
    participant Reth as "Reth (execution)"
    participant Sequencer as "Sequencer"
    participant Spamoor as "Spamoor"

    Test->>Suite: run SetupTest
    Suite->>Jaeger: start Jaeger
    Suite->>Reth: start Reth (OTLP -> Jaeger)
    Suite->>Sequencer: start Sequencer
    Suite->>Spamoor: build & start Spamoor (RPC -> Sequencer)
    Spamoor->>Jaeger: export traces
    Test->>Spamoor: create & run spammers
    Spamoor->>Sequencer: submit txs
    Sequencer->>Reth: propose blocks
    Test->>Jaeger: collect traces & metrics
    Jaeger-->>Test: return trace payloads
    Test->>Test: parse/aggregate spans, assert, write results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

T:testing, T:code-hygiene

Suggested reviewers

  • alpe
  • tzdybal
  • gupadhyaya

Poem

🐰 I hopped through traces, metrics in paw,
Rearranged the nest for Spamoor to draw,
Helpers exported, benchmarks tucked neat,
CI paths redirected — what a treat!
Hop, run, report — the suite’s ready to eat.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is vague and lacks sufficient detail about the changes, objectives, and rationale for the refactoring. Expand the description to explain why the refactoring was necessary, what benefits the testify suite provides, and clarify the relationship between the old and new test structures.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: move spamoor benchmark into testify suite' clearly and concisely describes the main change—moving an existing benchmark test into a new testify-based suite structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cian/bench-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 25, 2026, 2:46 PM

@claude
Copy link
Contributor

claude bot commented Feb 25, 2026

Claude finished @chatton's task in 1m 54s —— View job


Code Review

  • Gather context and understand PR scope
  • Review new benchmark package files
  • Review CI workflow changes
  • Review exported helper changes in evm_test_common.go
  • Compile and post review findings

Good refactor overall — isolating the Spamoor benchmark into its own testify suite with a clean env struct and per-test setup is much more maintainable than the previous monolithic approach. The SetupTest isolation pattern is correct and the use of t.Cleanup throughout is idiomatic.

I found a few issues ranging from functional bugs to style nitpicks.


Bugs / Functional Issues

1. OTLP endpoint configuration is incorrecttest/e2e/benchmark/suite_test.go:74-77

The OpenTelemetry spec requires OTEL_EXPORTER_OTLP_ENDPOINT to be a base URL; the SDK appends /v1/traces automatically. The current code sets it to .../v1/traces, which will cause requests to go to .../v1/traces/v1/traces. Additionally, "http" is not a valid OTLP protocol value — the spec defines grpc, http/protobuf, and http/json.

// current (broken)
"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
"OTEL_EXPORTER_OTLP_PROTOCOL=http",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http",

// correct
"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint(),   // base URL, no suffix
"OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
// OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is fine as-is (per-signal = full URL)

Fix this →

2. Hardcoded rethTag: "pr-140" will breaktest/e2e/benchmark/spamoor_smoke_test.go:21

This references an ephemeral PR image tag that will become invalid once the image is cleaned up. It should be configurable via an env var with a sensible default so CI can override it without a code change.

rethTag := os.Getenv("BENCH_RETH_TAG")
if rethTag == "" {
    rethTag = "pr-140"
}
e := s.setupEnv(config{
    rethTag:     rethTag,
    serviceName: "ev-node-smoke",
})

Fix this →


Design / Robustness

3. flush() uses require.NoError in a deferred contexttest/e2e/benchmark/output.go:74-75

require.NoError calls runtime.Goexit via FailNow. When flush is called via defer, this means an I/O error writing the benchmark JSON file can mask the original test failure. Prefer non-fatal logging here:

data, err := json.MarshalIndent(w.entries, "", "  ")
if err != nil {
    w.t.Logf("WARNING: failed to marshal benchmark JSON: %v", err)
    return
}
if err := os.WriteFile(outputPath, data, 0644); err != nil {
    w.t.Logf("WARNING: failed to write benchmark JSON to %s: %v", outputPath, err)
    return
}

4. Fixed 60-second sleep for trace accumulationtest/e2e/benchmark/spamoor_smoke_test.go:70

This is the canonical fragile-test antipattern. On slow CI runners it may be too short; on fast machines it wastes time. The collectServiceTraces helper already polls Jaeger with a timeout via WaitForTraces — the sleep before it could simply be removed, or converted to poll for a minimum span count.

5. addEntry method is dead codetest/e2e/benchmark/output.go:61

resultWriter.addEntry is defined but never called anywhere in the package. Either add a use or remove it to avoid confusion about whether it's part of the intended API.


Code Quality Nitpicks

6. Loop variable t shadows test helper conventiontest/e2e/benchmark/traces.go:25

for _, t := range traces {   // 't' reads as testing.TB to any Go developer

Rename to traceEntry or raw to avoid confusion.

7. Stale doc commenttest/e2e/evm_test_common.go:532

The comment still says setupCommonEVMEnv (the old unexported name); Go convention requires the doc comment to start with the exported function name:

// SetupCommonEVMEnv creates and initializes ev-reth instances...

8. requireHTTP has no cancellation pathtest/e2e/benchmark/metrics.go:15

The polling loop uses time.Now().Before(deadline) with time.Sleep inside, with no hook for test context cancellation (t.Context()). If the test is killed mid-flight, this loop will run until the deadline. Minor in practice but inconsistent with the rest of the suite which correctly passes ctx everywhere.


CI

9. Missing path trigger for Block Executor benchmarks.github/workflows/benchmark.yml:11-16

The evm-benchmark job runs go test ./block/internal/executing/... but block/internal/executing/** is not in the PR path filter. Changes to those files won't trigger the workflow on PRs.

Note: I cannot modify workflow files directly (GitHub App permission restriction), but this is a straightforward one-liner addition to the paths: list.


Performance Regression

The benchmark-action comment shows 3 Spamoor metrics regressed >1.5x vs the previous run. These are likely noise from the infrastructure refactor rather than a code regression, but worth checking:

Metric Ratio
Store.NewBatch (avg) 1.75x
Batch.SetHeight (avg) 1.60x
Store.GetMetadata (avg) 1.57x

These could be caused by the new suite initialization overhead or Docker orchestration differences in the refactored setup.


Summary: The structural refactor is clean and the direction is right. The two hard-blockers before merge are the OTLP endpoint misconfiguration (items 1) which will produce silently incorrect tracing data, and the hardcoded PR image tag (item 2) which will cause test failures once that image is cleaned up.

defer w.flush()

e := s.setupEnv(config{
rethTag: "pr-140",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR the traces were added, we can change this to a proper tag in a follow up.

@chatton chatton marked this pull request as ready for review February 25, 2026 12:30
pull_request:
branches:
- main
paths:
Copy link
Contributor Author

@chatton chatton Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only run benchmark tests for PRs when relevant files are changed.

go test sets the working directory to the package under test, so the
env var should be relative to test/e2e/benchmark/, not test/e2e/.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
test/e2e/benchmark/spamoor_smoke_test.go (1)

20-23: ⚠️ Potential issue | 🟠 Major

Hardcoded rethTag: "pr-140" will become stale and cause opaque failures.

This tag refers to a specific container image built for a PR. Once that image is cleaned up or the tag is overwritten, the test will silently fail to start or pull the container with a confusing error. Expose this via an environment variable or CLI flag so it can be updated without a code change.

🔧 Suggested refactor
+rethTagFromEnv := os.Getenv("BENCH_RETH_TAG")
+if rethTagFromEnv == "" {
+    rethTagFromEnv = "pr-140"
+}
 e := s.setupEnv(config{
-    rethTag:     "pr-140",
+    rethTag:     rethTagFromEnv,
     serviceName: "ev-node-smoke",
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/spamoor_smoke_test.go` around lines 20 - 23, Replace the
hardcoded rethTag value ("pr-140") in the test setup by reading it from
configuration (environment variable or test CLI flag) so tests don't break when
image tags change: update the call site where s.setupEnv(config{ rethTag:
"pr-140", ... }) is used in spamoor_smoke_test.go to instead pull the tag from
an env var (eg. os.Getenv("RETH_TAG")) or a provided flag, and adjust the config
struct and any test bootstrap that constructs config to accept and pass that
dynamic value (look for the config type, its rethTag field, and the setupEnv
function to make the change).
🧹 Nitpick comments (5)
test/e2e/benchmark/metrics.go (1)

15-34: Consider accepting a context.Context for cancellable polling.

requireHTTP drives a blocking polling loop with no way to interrupt it if the parent test is cancelled mid-flight (e.g., via t.Context() deadline or a FailNow from another goroutine). For long timeouts this can leave dangling goroutines/blocked tests.

🔧 Suggested refactor
-func requireHTTP(t testing.TB, url string, timeout time.Duration) {
+func requireHTTP(t testing.TB, ctx context.Context, url string, timeout time.Duration) {
 	t.Helper()
 	client := &http.Client{Timeout: 200 * time.Millisecond}
-	deadline := time.Now().Add(timeout)
+	ctx, cancel := context.WithTimeout(ctx, timeout)
+	defer cancel()
 	var lastErr error
-	for time.Now().Before(deadline) {
+	ticker := time.NewTicker(100 * time.Millisecond)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ctx.Done():
+			t.Fatalf("daemon not ready at %s: %v", url, lastErr)
+			return
+		case <-ticker.C:
+		}
 		resp, err := client.Get(url)
 		if err == nil {
 			_ = resp.Body.Close()
 			if resp.StatusCode >= 200 && resp.StatusCode < 300 {
 				return
 			}
 			lastErr = fmt.Errorf("status %d", resp.StatusCode)
 		} else {
 			lastErr = err
 		}
-		time.Sleep(100 * time.Millisecond)
 	}
-	t.Fatalf("daemon not ready at %s: %v", url, lastErr)
 }

As per coding guidelines, "Use context.Context for cancellation in Go."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/metrics.go` around lines 15 - 34, The requireHTTP function
currently blocks and can't be cancelled; change its signature to accept a
context.Context (e.g., requireHTTP(ctx context.Context, t testing.TB, url
string, timeout time.Duration) or use t.Context() inside) and use that context
for cancellation inside the polling loop: create requests with
http.NewRequestWithContext or attach ctx to the client.Do call, replace
time.Sleep with a select on ctx.Done() and time.After(100*time.Millisecond),
check ctx.Err() and call t.Fatalf when the context is cancelled, and ensure you
still honor the timeout logic and preserve the lastErr handling in the loop.
test/e2e/benchmark/spamoor_smoke_test.go (1)

68-70: Fixed 60s time.Sleep is fragile for trace accumulation.

On fast machines traces may be ready much sooner; on slow CI runners traces may not be ready even after 60s. Consider polling the Jaeger API with a timeout instead of an unconditional sleep, making the test both faster in the happy path and more robust in the slow path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/spamoor_smoke_test.go` around lines 68 - 70, Replace the
fixed time.Sleep(60 * time.Second) wait with a polling loop that queries the
Jaeger API until the expected traces/spans from spamoor appear or a context
deadline elapses; implement this by replacing the direct call to time.Sleep with
a function (e.g., waitForTraces(ctx, serviceName, expectedSpanCount)) that uses
a context with timeout, exponential/backoff retries, and HTTP calls to the
Jaeger query endpoint, returning early when traces are observed and failing the
test only if the timeout is reached—update the test to call waitForTraces
instead of time.Sleep and use the existing service identifier used in the test
to locate traces.
test/e2e/benchmark/traces.go (1)

23-47: Loop variable t shadows the conventional test-helper name convention.

Using t as a range variable in extractSpansFromTraces (which imports "testing") is not a bug but is slightly confusing, particularly for readers accustomed to t meaning testing.TB. A more descriptive name like traceEntry or raw would improve clarity.

🔧 Suggested refactor
-	for _, t := range traces {
-		traceMap, ok := t.(map[string]any)
+	for _, traceEntry := range traces {
+		traceMap, ok := traceEntry.(map[string]any)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/traces.go` around lines 23 - 47, In
extractSpansFromTraces, rename the range variable t to a clearer name (e.g.,
traceEntry or rawTrace) to avoid shadowing the conventional testing variable t;
update all uses of t in that function (the outer loop in extractSpansFromTraces
and its casts to traceMap) so the logic remains unchanged but the variable is
more descriptive and less confusing to readers familiar with testing.TB.
test/e2e/benchmark/output.go (1)

67-76: require.NoError inside a deferred flush will call runtime.Goexit on failure, which is safe but worth documenting.

When flush is called via defer after a test has already failed, require.NoErrorFailNowruntime.Goexit is invoked from within the deferred function, which Go handles correctly (deferred calls complete). No behavioral bug here, but the interaction with require vs os.WriteFile returning an error silently (by being suppressed) could be surprising. Consider logging the error path rather than asserting fatally so benchmark output failures don't obscure the real test failure.

🔧 Suggested refactor
 func (w *resultWriter) flush() {
 	outputPath := os.Getenv("BENCH_JSON_OUTPUT")
 	if outputPath == "" || len(w.entries) == 0 {
 		return
 	}
 
 	data, err := json.MarshalIndent(w.entries, "", "  ")
-	require.NoError(w.t, err, "failed to marshal benchmark JSON")
-	require.NoError(w.t, os.WriteFile(outputPath, data, 0644), "failed to write benchmark JSON to %s", outputPath)
+	if err != nil {
+		w.t.Logf("WARNING: failed to marshal benchmark JSON: %v", err)
+		return
+	}
+	if err := os.WriteFile(outputPath, data, 0644); err != nil {
+		w.t.Logf("WARNING: failed to write benchmark JSON to %s: %v", outputPath, err)
+		return
+	}
 	w.t.Logf("wrote %d benchmark entries to %s", len(w.entries), outputPath)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/output.go` around lines 67 - 76, In resultWriter.flush,
avoid using require.NoError (which calls FailNow/Goexit when deferred) for the
json.MarshalIndent and os.WriteFile errors; instead check the error values from
json.MarshalIndent and os.WriteFile and log them non-fatally (e.g., use
w.t.Errorf or w.t.Logf with descriptive messages referencing outputPath and the
error) so flush reports failures without invoking runtime.Goexit from a deferred
context; update the calls that currently use require.NoError(w.t, ...) to
explicit if err != nil { w.t.Errorf("...: %v", err); return } or similar logging
behavior for both the marshal and write steps in resultWriter.flush.
test/e2e/evm_test_common.go (1)

532-534: Doc comment references the old unexported name.

Line 532 says setupCommonEVMEnv but the function is now exported as SetupCommonEVMEnv. Per Go convention, the doc comment should start with the exported identifier name.

Proposed fix
-// setupCommonEVMEnv creates and initializes ev-reth instances, while also initializing the local ev-node instance
+// SetupCommonEVMEnv creates and initializes ev-reth instances, while also initializing the local ev-node instance
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/evm_test_common.go` around lines 532 - 534, The doc comment
currently starts with the old unexported name; update the comment to begin with
the exported function name SetupCommonEVMEnv so it follows Go doc conventions
(e.g., change the leading sentence from "setupCommonEVMEnv creates..." to
"SetupCommonEVMEnv creates..."), leaving the rest of the description intact and
ensuring the comment now documents the exported SetupCommonEVMEnv function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 11-16: Update the PR path filter in the GitHub Actions workflow to
include the block executor benchmark sources so changes to that package trigger
the workflow; specifically add the path pattern 'block/internal/executing/**'
(or more broadly 'block/**' if preferred) to the existing paths array in
.github/workflows/benchmark.yml so the job that runs `go test
./block/internal/executing/...` will trigger on relevant PRs.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 73-78: The OTLP env setup in the b.WithTag(...).WithEnv call is
wrong: change the value for "OTEL_EXPORTER_OTLP_ENDPOINT" to use the base ingest
URL (remove the "/v1/traces" suffix) so the SDK can append per-signal paths,
keep "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" as-is (full URL), and update
"OTEL_EXPORTER_OTLP_PROTOCOL" and "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL" from
"http" to the spec-correct "http/protobuf" (or another valid spec value like
"grpc" or "http/json" if intended).

---

Duplicate comments:
In `@test/e2e/benchmark/spamoor_smoke_test.go`:
- Around line 20-23: Replace the hardcoded rethTag value ("pr-140") in the test
setup by reading it from configuration (environment variable or test CLI flag)
so tests don't break when image tags change: update the call site where
s.setupEnv(config{ rethTag: "pr-140", ... }) is used in spamoor_smoke_test.go to
instead pull the tag from an env var (eg. os.Getenv("RETH_TAG")) or a provided
flag, and adjust the config struct and any test bootstrap that constructs config
to accept and pass that dynamic value (look for the config type, its rethTag
field, and the setupEnv function to make the change).

---

Nitpick comments:
In `@test/e2e/benchmark/metrics.go`:
- Around line 15-34: The requireHTTP function currently blocks and can't be
cancelled; change its signature to accept a context.Context (e.g.,
requireHTTP(ctx context.Context, t testing.TB, url string, timeout
time.Duration) or use t.Context() inside) and use that context for cancellation
inside the polling loop: create requests with http.NewRequestWithContext or
attach ctx to the client.Do call, replace time.Sleep with a select on ctx.Done()
and time.After(100*time.Millisecond), check ctx.Err() and call t.Fatalf when the
context is cancelled, and ensure you still honor the timeout logic and preserve
the lastErr handling in the loop.

In `@test/e2e/benchmark/output.go`:
- Around line 67-76: In resultWriter.flush, avoid using require.NoError (which
calls FailNow/Goexit when deferred) for the json.MarshalIndent and os.WriteFile
errors; instead check the error values from json.MarshalIndent and os.WriteFile
and log them non-fatally (e.g., use w.t.Errorf or w.t.Logf with descriptive
messages referencing outputPath and the error) so flush reports failures without
invoking runtime.Goexit from a deferred context; update the calls that currently
use require.NoError(w.t, ...) to explicit if err != nil { w.t.Errorf("...: %v",
err); return } or similar logging behavior for both the marshal and write steps
in resultWriter.flush.

In `@test/e2e/benchmark/spamoor_smoke_test.go`:
- Around line 68-70: Replace the fixed time.Sleep(60 * time.Second) wait with a
polling loop that queries the Jaeger API until the expected traces/spans from
spamoor appear or a context deadline elapses; implement this by replacing the
direct call to time.Sleep with a function (e.g., waitForTraces(ctx, serviceName,
expectedSpanCount)) that uses a context with timeout, exponential/backoff
retries, and HTTP calls to the Jaeger query endpoint, returning early when
traces are observed and failing the test only if the timeout is reached—update
the test to call waitForTraces instead of time.Sleep and use the existing
service identifier used in the test to locate traces.

In `@test/e2e/benchmark/traces.go`:
- Around line 23-47: In extractSpansFromTraces, rename the range variable t to a
clearer name (e.g., traceEntry or rawTrace) to avoid shadowing the conventional
testing variable t; update all uses of t in that function (the outer loop in
extractSpansFromTraces and its casts to traceMap) so the logic remains unchanged
but the variable is more descriptive and less confusing to readers familiar with
testing.TB.

In `@test/e2e/evm_test_common.go`:
- Around line 532-534: The doc comment currently starts with the old unexported
name; update the comment to begin with the exported function name
SetupCommonEVMEnv so it follows Go doc conventions (e.g., change the leading
sentence from "setupCommonEVMEnv creates..." to "SetupCommonEVMEnv creates..."),
leaving the rest of the description intact and ensuring the comment now
documents the exported SetupCommonEVMEnv function.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 212ac08 and cc56590.

📒 Files selected for processing (12)
  • .github/workflows/benchmark.yml
  • test/e2e/benchmark/metrics.go
  • test/e2e/benchmark/output.go
  • test/e2e/benchmark/spamoor_smoke_test.go
  • test/e2e/benchmark/suite_test.go
  • test/e2e/benchmark/traces.go
  • test/e2e/evm_contract_bench_test.go
  • test/e2e/evm_force_inclusion_e2e_test.go
  • test/e2e/evm_full_node_e2e_test.go
  • test/e2e/evm_spamoor_smoke_test.go
  • test/e2e/evm_test_common.go
  • test/e2e/failover_e2e_test.go
💤 Files with no reviewable changes (1)
  • test/e2e/evm_spamoor_smoke_test.go

Comment on lines +11 to +16
paths:
- 'test/e2e/benchmark/**'
- 'test/e2e/evm_contract_bench_test.go'
- 'test/e2e/evm_test_common.go'
- 'test/e2e/sut_helper.go'
- '.github/workflows/benchmark.yml'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

PR path filter misses Block Executor benchmark sources.

This workflow runs go test ./block/internal/executing/... (Line 40), but PR paths do not include that tree. Changes to those benchmarks/sources can skip this workflow unexpectedly.

Suggested fix
   pull_request:
     branches:
       - main
     paths:
       - 'test/e2e/benchmark/**'
       - 'test/e2e/evm_contract_bench_test.go'
       - 'test/e2e/evm_test_common.go'
       - 'test/e2e/sut_helper.go'
+      - 'block/internal/executing/**'
       - '.github/workflows/benchmark.yml'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml around lines 11 - 16, Update the PR path
filter in the GitHub Actions workflow to include the block executor benchmark
sources so changes to that package trigger the workflow; specifically add the
path pattern 'block/internal/executing/**' (or more broadly 'block/**' if
preferred) to the existing paths array in .github/workflows/benchmark.yml so the
job that runs `go test ./block/internal/executing/...` will trigger on relevant
PRs.

Comment on lines +73 to +78
b.WithTag(cfg.rethTag).WithEnv(
"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
"OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
"OTEL_EXPORTER_OTLP_PROTOCOL=http",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http",
"RUST_LOG=debug",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

OpenTelemetry OTEL_EXPORTER_OTLP_ENDPOINT specification base URL path suffix

💡 Result:

For OTLP/HTTP, OTEL_EXPORTER_OTLP_ENDPOINT is specified as a base URL. Exporters must append a signal-specific relative path to it:

  • Traces: v1/traces
  • Metrics: v1/metrics
  • Logs: v1/logs

So if you set:

  • OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318

the actual requests go to:

  • http://collector:4318/v1/traces, .../v1/metrics, .../v1/logs (respectively). The base URL may itself include a path prefix (e.g. /mycollector/), in which case the v1/* suffix is appended under that prefix. [1]

If you instead set a per-signal endpoint (e.g. OTEL_EXPORTER_OTLP_TRACES_ENDPOINT), that URL is used as-is (no automatic /v1/traces appending); if it has no path component, / is used. [1]

Separately, the OTLP protocol spec defines the default OTLP/HTTP paths as /v1/traces, /v1/metrics, /v1/logs (server-side defaults), but clients may be configured to use non-default paths. [2]

For OTLP/gRPC, OTEL_EXPORTER_OTLP_ENDPOINT is a target for the gRPC client (typically host/port; path suffixes like /v1/traces are not part of the gRPC endpoint concept). [1]

Sources: [1] OpenTelemetry OTLP Exporter spec (Endpoint URLs for OTLP/HTTP) [2] OTLP Specification (default OTLP/HTTP URL paths)


🌐 Web query:

OTEL_EXPORTER_OTLP_PROTOCOL valid values http/protobuf http/json grpc

💡 Result:

OTEL_EXPORTER_OTLP_PROTOCOL valid values are exactly: grpc, http/protobuf, http/json. [1][2]


🏁 Script executed:

find . -name "suite_test.go" -path "*/benchmark/*" | head -5

Repository: evstack/ev-node

Length of output: 93


🏁 Script executed:

sed -n '65,85p' ./test/e2e/benchmark/suite_test.go

Repository: evstack/ev-node

Length of output: 889


🏁 Script executed:

rg "IngestHTTPEndpoint" -A 2 -B 2 | head -50

Repository: evstack/ev-node

Length of output: 1044


🏁 Script executed:

rg -i "reth.*otlp|otlp.*protocol" | head -20

Repository: evstack/ev-node

Length of output: 292


🏁 Script executed:

rg -i "http/protobuf|http/json" test/ | head -20

Repository: evstack/ev-node

Length of output: 41


🌐 Web query:

Reth OTLP exporter HTTP protocol http/protobuf http/json specification

💡 Result:

OTLP exporter over HTTP: http/protobuf vs http/json (spec)

Transport & endpoints (OTLP/HTTP):

  • Uses HTTP POST over HTTP/1.1 or HTTP/2. (opentelemetry.io)
  • Default URL paths:
    • Traces: POST /v1/traces with ExportTraceServiceRequest
    • Metrics: POST /v1/metrics with ExportMetricsServiceRequest
    • Logs: POST /v1/logs with ExportLogsServiceRequest (opentelemetry.io)
  • When SDKs are configured with a base endpoint (OTEL_EXPORTER_OTLP_ENDPOINT), they construct per-signal URLs by appending v1/traces, v1/metrics, v1/logs (unless a per-signal endpoint is set). (opentelemetry.io)

Encoding choices (same protobuf schema, different wire format):

  • http/protobuf: protobuf messages serialized in binary proto3. (opentelemetry.io)
  • http/json: protobuf messages serialized using protobuf JSON mapping, with OTLP-specific rules. (opentelemetry.io)

OTLP JSON-specific mapping rules (important for http/json):

Compression:

  • Client may gzip the request body; if so it MUST send Content-Encoding: gzip. (opentelemetry.io)
  • Client may send Accept-Encoding: gzip; server may gzip responses. (opentelemetry.io)

Responses:

  • Success is HTTP 200 OK with an Export<signal>ServiceResponse (e.g., ExportTraceServiceResponse). (opentelemetry.io)
  • Server must respond using the same Content-Type it received. (opentelemetry.io)

How exporters select protocol:

  • OTEL_EXPORTER_OTLP_PROTOCOL (and per-signal variants) supports: grpc, http/protobuf, http/json; default is typically http/protobuf. (opentelemetry.io)

Where the protobuf definitions live:

  • The canonical .proto files and OTLP protocol materials are in open-telemetry/opentelemetry-proto. (github.com)

Citations:


Remove the /v1/traces suffix from OTEL_EXPORTER_OTLP_ENDPOINT and change protocol values to http/protobuf.

The OpenTelemetry specification requires that OTEL_EXPORTER_OTLP_ENDPOINT be a base URL, with the SDK automatically appending signal-specific paths (/v1/traces, /v1/metrics, /v1/logs). Setting it to …/v1/traces will cause requests to be sent to …/v1/traces/v1/traces. The per-signal endpoint OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is correct as-is since it expects the full URL without automatic appending.

Additionally, OTEL_EXPORTER_OTLP_PROTOCOL and OTEL_EXPORTER_OTLP_TRACES_PROTOCOL must use spec-defined values: grpc, http/protobuf, or http/json. The value "http" is not valid.

Proposed fix
-				"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
+				"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint(),
 				"OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
-				"OTEL_EXPORTER_OTLP_PROTOCOL=http",
-				"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http",
+				"OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf",
+				"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
b.WithTag(cfg.rethTag).WithEnv(
"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
"OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
"OTEL_EXPORTER_OTLP_PROTOCOL=http",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http",
"RUST_LOG=debug",
b.WithTag(cfg.rethTag).WithEnv(
"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint(),
"OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
"OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
"RUST_LOG=debug",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/suite_test.go` around lines 73 - 78, The OTLP env setup in
the b.WithTag(...).WithEnv call is wrong: change the value for
"OTEL_EXPORTER_OTLP_ENDPOINT" to use the base ingest URL (remove the
"/v1/traces" suffix) so the SDK can append per-signal paths, keep
"OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" as-is (full URL), and update
"OTEL_EXPORTER_OTLP_PROTOCOL" and "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL" from
"http" to the spec-correct "http/protobuf" (or another valid spec value like
"grpc" or "http/json" if intended).

go test treats all arguments after an unknown flag (--evm-binary) as
test binary args, so ./benchmark/ was never recognized as a package
pattern.
@chatton chatton marked this pull request as draft February 25, 2026 13:44
go test sets the cwd to the package directory (test/e2e/benchmark/),
so the binary path needs an extra parent traversal.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite Current: ae525ca Previous: 212ac08 Ratio
SpamoorSmoke - Batch.SetHeight (avg) 14.966867469879517 us 9.33933933933934 us 1.60
SpamoorSmoke - Store.GetMetadata (avg) 9.872934263679765 us 6.306730415593968 us 1.57
SpamoorSmoke - Store.NewBatch (avg) 13.335843373493976 us 7.635542168674699 us 1.75

This comment was automatically generated by workflow using github-action-benchmark.

The benchmark package doesn't define the --binary flag that test-e2e
passes. It has its own CI workflow so it doesn't need to run here.
@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.93%. Comparing base (212ac08) to head (85c9d2d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3107      +/-   ##
==========================================
- Coverage   60.95%   60.93%   -0.02%     
==========================================
  Files         113      113              
  Lines       11617    11617              
==========================================
- Hits         7081     7079       -2     
- Misses       3738     3739       +1     
- Partials      798      799       +1     
Flag Coverage Δ
combined 60.93% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant