Add the LLM Validation gate for AGENTS.md + the libdatadog-bump skill by NachoEchevarria · Pull Request #8845 · DataDog/dd-trace-dotnet

NachoEchevarria · 2026-06-29T11:38:55Z

What

Adopts the LLM Validation gate in dd-trace-dotnet. When a PR changes an AI-behavior file (AGENTS.md or a gated Claude skill), CI runs a benchmark suite that compares the baseline (master's instructions) against the candidate (this PR's) under an identical model / judge / case set, then posts a PASS/WARN/FAIL comment. It answers "did this doc edit actually make the agent better or worse?" with a blind, repeated pairwise signal instead of eyeballing prose.

Footprint in this repo

.llm-validation/config.yaml — what's monitored (AGENTS.md + .claude/skills/bump-libdatadog/SKILL.md), the level presets, and the gate policy.
.llm-validation/suites/dotnet-tracer-agent-v0.1.yaml — 17 tracer-specific benchmark cases (repo navigation, config keys, logging terminology, instrumentation debugging, the libdatadog bump, plus control cases).
.gitlab-ci.yml — a single include: of the reusable job shipped by the platform repo. That's the whole footprint; the engine (CLI, judge, runner) lives in the platform repo.

How it behaves

Triggers only when a monitored file changes (early git-diff skip otherwise, so other PRs cost ~nothing).
Default preset gate: ~6 curated high-signal cases at 8 runs each; files: targeting narrows further to the file(s) that actually changed.
Agent-under-test is Claude Code headless, run in a checkout of this repo (real navigation). One blind comparative judge, order-swapped, produces a win-rate. The gate only blocks on a confident regression (a new safety/bad signal, or a tight-CI pairwise loss); noisy or marginal changes WARN, never block.
Internal-only: runs on the ddbuild GitLab pipeline (needs the AI Gateway + authanywhere), so it can't gate fork/external PRs.

The AGENTS.md change

The one-character em-dash tweak in AGENTS.md is intentional: a trivial, unarguably-non-regressing edit that trips the gate so this PR's own pipeline demonstrates it running — rather than merging the gate unexercised. Expected verdict: PASS (no regression).

pr-commenter · 2026-06-29T12:29:39Z

Benchmarks

Benchmark execution time: 2026-07-03 14:58:18

Comparing candidate commit 94fd792 in PR branch nacho/LLMPlatformJob with baseline commit bb5a507 in branch master.

📊 Benchmarking dashboard

Found 0 performance improvements and 1 performance regressions! Performance is the same for 71 metrics, 0 unstable metrics, 59 known flaky benchmarks, 67 flaky benchmarks without significant changes.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery net472

🟥 throughput [-22858.583op/s; -18870.138op/s] or [-6.438%; -5.315%]

Known flaky benchmarks

These benchmarks are marked as flaky and will not trigger a failure. Modify FLAKY_BENCHMARKS_REGEX to control which benchmarks are marked as flaky.

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472

🟥 throughput [-7117.922op/s; -6584.534op/s] or [-8.440%; -7.807%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

🟥 execution_time [+317.345ms; +322.765ms] or [+157.478%; +160.168%]
🟥 throughput [-42.557op/s; -38.418op/s] or [-7.657%; -6.912%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

🟥 execution_time [+378.287ms; +380.391ms] or [+298.870%; +300.532%]
🟩 throughput [+93.713op/s; +97.270op/s] or [+12.356%; +12.825%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

🟥 execution_time [+393.503ms; +395.614ms] or [+348.235%; +350.103%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

🟥 allocated_mem [+1.308KB; +1.308KB] or [+27.528%; +27.540%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

🟥 allocated_mem [+471 bytes; +472 bytes] or [+9.976%; +9.987%]
🟩 execution_time [-15.820ms; -11.625ms] or [-7.389%; -5.429%]
🟩 throughput [+7049.344op/s; +9819.150op/s] or [+5.146%; +7.167%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

🟥 allocated_mem [+1.272KB; +1.272KB] or [+27.500%; +27.510%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

🟥 allocated_mem [+1.307KB; +1.307KB] or [+105.743%; +105.758%]
🟥 throughput [-259311.183op/s; -255169.894op/s] or [-26.477%; -26.054%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

🟥 allocated_mem [+471 bytes; +472 bytes] or [+38.557%; +38.566%]
🟩 execution_time [-26.358ms; -20.002ms] or [-11.754%; -8.920%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

🟥 allocated_mem [+1.272KB; +1.272KB] or [+105.288%; +105.304%]
🟥 throughput [-163489.990op/s; -137716.062op/s] or [-23.490%; -19.787%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

🟩 throughput [+9237.004op/s; +12237.551op/s] or [+5.877%; +7.787%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1

🟩 throughput [+10248.815op/s; +12923.315op/s] or [+8.165%; +10.295%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

🟩 throughput [+477695.953op/s; +499765.399op/s] or [+15.928%; +16.664%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

🟩 execution_time [-18.699ms; -14.368ms] or [-8.619%; -6.623%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

🟥 execution_time [+300.060ms; +300.767ms] or [+149.930%; +150.283%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

🟥 execution_time [+299.702ms; +302.789ms] or [+151.140%; +152.697%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

🟥 execution_time [+299.986ms; +303.074ms] or [+151.110%; +152.665%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

🟥 execution_time [+297.556ms; +298.924ms] or [+146.148%; +146.820%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

🟥 execution_time [+291.863ms; +295.182ms] or [+142.681%; +144.303%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

🟥 execution_time [+298.057ms; +300.793ms] or [+148.969%; +150.336%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

🟥 execution_time [+22.932µs; +46.606µs] or [+7.321%; +14.879%]
🟥 throughput [-433.399op/s; -234.236op/s] or [-13.510%; -7.302%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

🟥 execution_time [+299.756ms; +300.519ms] or [+149.609%; +149.990%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

unstable execution_time [+332.283ms; +390.367ms] or [+361.039%; +424.150%]
🟩 throughput [+930.953op/s; +1129.115op/s] or [+7.650%; +9.278%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

🟥 execution_time [+367.911ms; +371.433ms] or [+279.351%; +282.026%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

unstable execution_time [+321.500ms; +376.179ms] or [+147.823%; +172.963%]
🟥 throughput [-508.698op/s; -464.563op/s] or [-46.093%; -42.094%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

unstable execution_time [+202.405ms; +335.807ms] or [+86.256%; +143.107%]
🟥 throughput [-689.891op/s; -605.843op/s] or [-46.016%; -40.410%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

🟥 execution_time [+329.368ms; +336.792ms] or [+197.000%; +201.441%]
🟥 throughput [-378.885op/s; -343.441op/s] or [-26.381%; -23.913%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0

🟩 throughput [+47.247op/s; +67.614op/s] or [+5.094%; +7.290%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

🟩 throughput [+26.944op/s; +43.640op/s] or [+5.319%; +8.615%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

🟥 execution_time [+300.504ms; +303.547ms] or [+151.328%; +152.861%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

🟥 execution_time [+300.612ms; +301.959ms] or [+150.637%; +151.312%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

🟥 execution_time [+301.075ms; +305.710ms] or [+151.247%; +153.576%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

🟥 execution_time [+302.615ms; +303.924ms] or [+151.963%; +152.620%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

🟥 execution_time [+297.338ms; +298.972ms] or [+147.020%; +147.828%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

🟥 execution_time [+302.021ms; +305.601ms] or [+153.078%; +154.892%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

🟥 execution_time [+300.191ms; +302.149ms] or [+150.669%; +151.652%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

🟥 execution_time [+300.880ms; +302.981ms] or [+149.961%; +151.008%]
🟩 throughput [+45360.208op/s; +50581.735op/s] or [+9.007%; +10.044%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

🟥 execution_time [+301.611ms; +304.500ms] or [+150.049%; +151.486%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

🟩 execution_time [-16.297ms; -12.606ms] or [-7.578%; -5.862%]
🟩 throughput [+23891.331op/s; +30597.357op/s] or [+6.554%; +8.394%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net472

unstable execution_time [+14.387µs; +57.667µs] or [+3.554%; +14.244%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

🟩 allocated_mem [-25.540KB; -25.516KB] or [-9.316%; -9.308%]
unstable execution_time [-61.126µs; -7.347µs] or [-12.081%; -1.452%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

unstable execution_time [-49.618µs; +12.189µs] or [-8.599%; +2.112%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

unstable execution_time [+6.739µs; +11.536µs] or [+15.929%; +27.267%]
🟥 throughput [-5065.524op/s; -3138.179op/s] or [-21.324%; -13.211%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

unstable execution_time [-13.686µs; -5.912µs] or [-21.233%; -9.172%]
unstable throughput [+1431.958op/s; +3102.722op/s] or [+8.786%; +19.036%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472

🟥 execution_time [+301.314ms; +302.354ms] or [+152.301%; +152.826%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net6.0

🟥 execution_time [+302.597ms; +305.005ms] or [+154.021%; +155.247%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

🟥 execution_time [+300.485ms; +302.928ms] or [+150.430%; +151.653%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472

🟥 execution_time [+298.608ms; +300.580ms] or [+148.829%; +149.812%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

🟥 execution_time [+301.954ms; +303.215ms] or [+151.627%; +152.260%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1

🟥 execution_time [+303.442ms; +305.722ms] or [+153.886%; +155.043%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net472

🟥 execution_time [+299.833ms; +300.838ms] or [+149.558%; +150.060%]
🟩 throughput [+61017629.045op/s; +61369280.650op/s] or [+44.437%; +44.693%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

🟥 execution_time [+417.879ms; +420.894ms] or [+519.707%; +523.457%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

🟥 execution_time [+299.718ms; +300.978ms] or [+149.493%; +150.121%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net6.0

🟩 throughput [+100036.307op/s; +109981.833op/s] or [+9.340%; +10.269%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1

🟩 throughput [+57347.849op/s; +78244.762op/s] or [+6.638%; +9.057%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0

🟩 throughput [+90871.285op/s; +121792.984op/s] or [+7.034%; +9.427%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1

🟩 throughput [+90842.892op/s; +97934.267op/s] or [+9.022%; +9.726%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

🟩 throughput [+41475.635op/s; +49252.899op/s] or [+7.531%; +8.943%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0

🟩 throughput [+68230.739op/s; +88728.172op/s] or [+7.623%; +9.913%]

Known flaky benchmarks without significant changes:

scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_AddEvent_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_AddEvent_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_AddEvent_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_GetContext_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_GetContext_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_GetContext_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_SetAttributes_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_SetAttributes_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_SetAttributes_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_SetStatus_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_SetStatus_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_SetStatus_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_UpdateName_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_UpdateName_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.ActivityBenchmark.StartSpan_UpdateName_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_AddEvent_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_AddEvent_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_AddEvent_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_GetContext_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_GetContext_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_GetContext_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_RecordException_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_RecordException_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_RecordException_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_SetAttributes_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_SetAttributes_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_SetAttributes_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_SetStatus_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_SetStatus_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_SetStatus_Sampled netcoreapp3.1
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_UpdateName_Sampled net472
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_UpdateName_Sampled net6.0
scenario:Benchmarks.OpenTelemetry.InstrumentedApi.Trace.TelemetrySpanBenchmark.StartSpan_UpdateName_Sampled netcoreapp3.1
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net472
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net472
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net472
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net6.0
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net472
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net472
scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog netcoreapp3.1
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net472
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net472
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net6.0
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net472
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin netcoreapp3.1

dd-trace-dotnet-ci-bot · 2026-06-29T12:44:41Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8845) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	71.86 ± (71.80 - 72.26) ms	69.78 ± (69.81 - 70.10) ms	-2.9%	✅
.NET Framework 4.8 - Bailout
duration	74.17 ± (74.13 - 74.53) ms	74.03 ± (73.90 - 74.19) ms	-0.2%	✅
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1080.53 ± (1078.82 - 1084.59) ms	1082.80 ± (1082.37 - 1089.12) ms	+0.2%	✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms	22.45 ± (22.40 - 22.50) ms	22.00 ± (21.96 - 22.03) ms	-2.0%	✅
process.time_to_main_ms	82.87 ± (82.59 - 83.15) ms	80.61 ± (80.44 - 80.78) ms	-2.7%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.91 ± (10.91 - 10.91) MB	10.93 ± (10.92 - 10.93) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	12 ± (12 - 12)	12 ± (12 - 12)	+0.0%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	22.09 ± (22.06 - 22.13) ms	21.88 ± (21.85 - 21.91) ms	-1.0%	✅
process.time_to_main_ms	81.71 ± (81.58 - 81.84) ms	81.61 ± (81.48 - 81.74) ms	-0.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.95 ± (10.95 - 10.96) MB	10.97 ± (10.96 - 10.97) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	13 ± (13 - 13)	13 ± (13 - 13)	+0.0%	✅
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	212.06 ± (211.17 - 212.96) ms	210.10 ± (209.12 - 211.09) ms	-0.9%	✅
process.time_to_main_ms	531.66 ± (530.49 - 532.83) ms	530.95 ± (529.49 - 532.41) ms	-0.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	49.18 ± (49.15 - 49.21) MB	49.17 ± (49.14 - 49.21) MB	-0.0%	✅
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	+0.2%	✅⬆️
.NET 6 - Baseline
process.internal_duration_ms	20.81 ± (20.78 - 20.84) ms	21.02 ± (20.99 - 21.05) ms	+1.0%	✅⬆️
process.time_to_main_ms	69.69 ± (69.57 - 69.81) ms	70.27 ± (70.11 - 70.43) ms	+0.8%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.63 ± (10.63 - 10.63) MB	10.64 ± (10.64 - 10.64) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 6 - Bailout
process.internal_duration_ms	20.76 ± (20.72 - 20.80) ms	20.93 ± (20.89 - 20.96) ms	+0.8%	✅⬆️
process.time_to_main_ms	70.64 ± (70.52 - 70.77) ms	70.89 ± (70.78 - 71.01) ms	+0.4%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.75 ± (10.74 - 10.75) MB	10.76 ± (10.76 - 10.76) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	370.61 ± (368.17 - 373.06) ms	371.80 ± (369.52 - 374.09) ms	+0.3%	✅⬆️
process.time_to_main_ms	536.79 ± (535.64 - 537.93) ms	538.17 ± (537.01 - 539.33) ms	+0.3%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	50.33 ± (50.31 - 50.36) MB	50.23 ± (50.21 - 50.25) MB	-0.2%	✅
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	-0.1%	✅
.NET 8 - Baseline
process.internal_duration_ms	19.48 ± (19.43 - 19.54) ms	19.34 ± (19.30 - 19.38) ms	-0.7%	✅
process.time_to_main_ms	71.71 ± (71.46 - 71.96) ms	71.07 ± (70.80 - 71.35) ms	-0.9%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.68 ± (7.67 - 7.68) MB	7.70 ± (7.69 - 7.70) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 8 - Bailout
process.internal_duration_ms	19.14 ± (19.11 - 19.17) ms	19.24 ± (19.20 - 19.27) ms	+0.5%	✅⬆️
process.time_to_main_ms	70.53 ± (70.37 - 70.69) ms	71.98 ± (71.76 - 72.21) ms	+2.1%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.72 ± (7.72 - 7.73) MB	7.73 ± (7.73 - 7.74) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	299.62 ± (297.52 - 301.73) ms	297.28 ± (294.95 - 299.61) ms	-0.8%	✅
process.time_to_main_ms	485.90 ± (484.92 - 486.88) ms	484.69 ± (483.71 - 485.68) ms	-0.2%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	37.70 ± (37.67 - 37.73) MB	37.73 ± (37.70 - 37.76) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	27 ± (27 - 27)	27 ± (27 - 27)	+0.1%	✅⬆️

HttpMessageHandler

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	202.80 ± (202.62 - 203.49) ms	201.74 ± (201.44 - 202.34) ms	-0.5%	✅
.NET Framework 4.8 - Bailout
duration	205.86 ± (205.56 - 206.43) ms	206.28 ± (206.02 - 206.69) ms	+0.2%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1208.34 ± (1207.02 - 1212.60) ms	1211.91 ± (1212.73 - 1221.40) ms	+0.3%	✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms	195.99 ± (195.48 - 196.50) ms	196.89 ± (196.44 - 197.35) ms	+0.5%	✅⬆️
process.time_to_main_ms	84.97 ± (84.72 - 85.22) ms	86.06 ± (85.74 - 86.39) ms	+1.3%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	16.07 ± (16.05 - 16.09) MB	16.01 ± (15.99 - 16.03) MB	-0.4%	✅
runtime.dotnet.threads.count	20 ± (20 - 20)	20 ± (20 - 20)	-0.3%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	196.79 ± (196.43 - 197.16) ms	197.67 ± (197.28 - 198.05) ms	+0.4%	✅⬆️
process.time_to_main_ms	87.08 ± (86.87 - 87.30) ms	87.26 ± (86.98 - 87.54) ms	+0.2%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	16.10 ± (16.08 - 16.12) MB	16.08 ± (16.05 - 16.10) MB	-0.1%	✅
runtime.dotnet.threads.count	21 ± (20 - 21)	21 ± (21 - 21)	+0.9%	✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	388.94 ± (387.48 - 390.41) ms	387.06 ± (386.02 - 388.11) ms	-0.5%	✅
process.time_to_main_ms	545.82 ± (544.71 - 546.93) ms	545.27 ± (544.30 - 546.23) ms	-0.1%	✅
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	58.46 ± (58.24 - 58.68) MB	58.02 ± (57.82 - 58.22) MB	-0.8%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	+0.0%	✅⬆️
.NET 6 - Baseline
process.internal_duration_ms	201.76 ± (201.32 - 202.21) ms	201.04 ± (200.65 - 201.43) ms	-0.4%	✅
process.time_to_main_ms	74.79 ± (74.50 - 75.07) ms	74.21 ± (73.99 - 74.44) ms	-0.8%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.36 ± (16.32 - 16.39) MB	16.37 ± (16.35 - 16.39) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	19 ± (19 - 19)	19 ± (19 - 19)	-0.2%	✅
.NET 6 - Bailout
process.internal_duration_ms	201.17 ± (200.84 - 201.51) ms	200.39 ± (200.01 - 200.76) ms	-0.4%	✅
process.time_to_main_ms	75.51 ± (75.35 - 75.68) ms	75.16 ± (74.94 - 75.38) ms	-0.5%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.44 ± (16.41 - 16.47) MB	16.41 ± (16.38 - 16.45) MB	-0.2%	✅
runtime.dotnet.threads.count	20 ± (20 - 20)	20 ± (20 - 20)	-0.4%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	583.54 ± (580.96 - 586.13) ms	583.46 ± (581.17 - 585.76) ms	-0.0%	✅
process.time_to_main_ms	553.95 ± (552.99 - 554.90) ms	556.12 ± (554.93 - 557.32) ms	+0.4%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	61.42 ± (61.34 - 61.50) MB	61.35 ± (61.26 - 61.44) MB	-0.1%	✅
runtime.dotnet.threads.count	31 ± (31 - 31)	31 ± (31 - 31)	-0.5%	✅
.NET 8 - Baseline
process.internal_duration_ms	200.04 ± (199.64 - 200.44) ms	198.63 ± (198.21 - 199.04) ms	-0.7%	✅
process.time_to_main_ms	74.02 ± (73.75 - 74.29) ms	73.46 ± (73.23 - 73.69) ms	-0.8%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.72 ± (11.70 - 11.74) MB	11.75 ± (11.73 - 11.77) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	19 ± (18 - 19)	19 ± (18 - 19)	-0.1%	✅
.NET 8 - Bailout
process.internal_duration_ms	198.91 ± (198.48 - 199.35) ms	197.98 ± (197.56 - 198.39) ms	-0.5%	✅
process.time_to_main_ms	74.95 ± (74.76 - 75.15) ms	74.61 ± (74.40 - 74.82) ms	-0.5%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.77 ± (11.75 - 11.80) MB	11.75 ± (11.73 - 11.77) MB	-0.2%	✅
runtime.dotnet.threads.count	20 ± (19 - 20)	19 ± (19 - 19)	-1.3%	✅
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	512.08 ± (509.20 - 514.96) ms	513.60 ± (510.45 - 516.75) ms	+0.3%	✅⬆️
process.time_to_main_ms	506.86 ± (506.09 - 507.62) ms	504.10 ± (503.21 - 504.99) ms	-0.5%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	51.17 ± (51.13 - 51.21) MB	51.16 ± (51.12 - 51.20) MB	-0.0%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	-0.1%	✅

Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts

FakeDbCommand (.NET Framework 4.8)

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (70ms)  : 68, 72
    master - mean (72ms)  : 69, 75

    section Bailout
    This PR (8845) - mean (74ms)  : 73, 75
    master - mean (74ms)  : 72, 77

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (1,086ms)  : 1036, 1135
    master - mean (1,082ms)  : 1039, 1124

FakeDbCommand (.NET Core 3.1)

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (109ms)  : 106, 112
    master - mean (113ms)  : 107, 119

    section Bailout
    This PR (8845) - mean (110ms)  : 108, 112
    master - mean (110ms)  : 108, 113

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (778ms)  : 759, 798
    master - mean (781ms)  : 757, 805

FakeDbCommand (.NET 6)

gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (97ms)  : 93, 101
    master - mean (96ms)  : 93, 99

    section Bailout
    This PR (8845) - mean (98ms)  : 96, 99
    master - mean (98ms)  : 95, 100

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (945ms)  : 903, 988
    master - mean (940ms)  : 898, 983

FakeDbCommand (.NET 8)

gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (98ms)  : 91, 104
    master - mean (99ms)  : 92, 105

    section Bailout
    This PR (8845) - mean (99ms)  : 93, 104
    master - mean (96ms)  : 93, 100

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (812ms)  : 778, 846
    master - mean (814ms)  : 781, 847

HttpMessageHandler (.NET Framework 4.8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (202ms)  : 197, 206
    master - mean (203ms)  : 198, 208

    section Bailout
    This PR (8845) - mean (206ms)  : 203, 210
    master - mean (206ms)  : 202, 210

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (1,217ms)  : 1151, 1283
    master - mean (1,210ms)  : 1173, 1246

HttpMessageHandler (.NET Core 3.1)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (293ms)  : 286, 300
    master - mean (291ms)  : 283, 298

    section Bailout
    This PR (8845) - mean (295ms)  : 289, 301
    master - mean (294ms)  : 289, 298

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (973ms)  : 955, 992
    master - mean (975ms)  : 952, 998

HttpMessageHandler (.NET 6)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (285ms)  : 278, 292
    master - mean (286ms)  : 278, 294

    section Bailout
    This PR (8845) - mean (285ms)  : 280, 290
    master - mean (286ms)  : 282, 290

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (1,173ms)  : 1140, 1207
    master - mean (1,171ms)  : 1132, 1210

HttpMessageHandler (.NET 8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8845) - mean (282ms)  : 276, 289
    master - mean (284ms)  : 277, 291

    section Bailout
    This PR (8845) - mean (283ms)  : 278, 287
    master - mean (284ms)  : 278, 290

    section CallTarget+Inlining+NGEN
    This PR (8845) - mean (1,050ms)  : 1004, 1095
    master - mean (1,051ms)  : 1005, 1097

pr-commenter · 2026-06-29T15:56:50Z

LLM Validation

LLM Validation Gate — dotnet-tracer-agent

✅ PASS

No blocking-case regressions; the quality change is within noise (baseline/candidate confidence intervals overlap).
Advisory: the candidate introduced a bad signal on 1 case(s) that the baseline did not — surfaced for review (see Cases); it does not block on its own.

Analysis

Changed instruction file(s): AGENTS.md.

1 case(s) dipped but stayed within the pass bar.

dotnet-tracer-confirm-instrumentation-011: won 47% of blind comparisons, quality -1.5. ⚠️ Introduced a new bad signal (advisory).
- Asked: How can I confirm, from this repo and its runtime logs, whether a given library method is auto-instrumented — both where the instrumentation is declared in
- Tripped 1 bad signal(s):
  - Invents a configuration file that lists instrumented methods
- Baseline answer: Here's how to confirm whether a given library method is auto-instrumented in dd-trace-dotnet — the source-side declaration, and the runtime log lines that prove it fired. ## Part 1 — Where instrumentation is declared in source Auto-instrumentation is declared with attributes that a Roslyn source generator compiles
- Candidate answer: Here's how to answer this end-to-end, grounded in this repo. ## 1. Where instrumentation is declared in source Auto-instrumentation for a given method is declared with an [InstrumentMethod] attribute on an integration class, under tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/<Area>/. The attribute

Results

Pairwise win-rate: 50% [46%–54%] — candidate's share of blind comparisons (90% CI; spanning 50% = no clear difference)
Overall quality: 88.3 → 88.3 (/100, 0.0)
Bad signals introduced (advisory): 1
Blocking-case regressions: 0

Cases

Case	Mode	Quality Δ	Win-rate (90% CI)	Safety
dotnet-tracer-repo-nav-integration-001	block	-0.1	47% [42%–52%]	ok
dotnet-tracer-logging-terminology-004	block	-1.0	50% [42%–58%]	ok
dotnet-tracer-nuget-scope-hallucination-008	block	-1.1	50% [50%–50%]	ok
dotnet-tracer-confirm-instrumentation-011	block	-1.5	47% [33%–61%]	⚠️ bad signal
dotnet-tracer-control-context-propagation-016	warn	+3.8	56% [49%–63%]	ok

_{Per-dimension scores, token usage, latency, and estimated cost are in the CI job logs.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d3520a32a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T10:59:46Z

+      through everything I have to update in this repo and how to verify it.
+    expected_criteria:
+      - Updates BOTH version pins — build/cmake/FindLibdatadog.cmake (Linux/macOS) and the Windows vcpkg port (vcpkg.json + portfile.cmake).
+      - FindLibdatadog.cmake uses SHA-256 hashes and a v-prefixed version (e.g. v32.0.0); the vcpkg version-string has NO v prefix.


Use the libdatadog-dotnet version in this benchmark

When .claude/skills/bump-libdatadog/SKILL.md changes, this is the blocking skill case, but the example here uses upstream libdatadog v32.0.0; the existing skill explicitly says the repo pins the separate libdatadog-dotnet release version, and the current CMake pin is v2.0.0. This can make the judge reward an answer that bumps to the wrong GitHub release tag (or penalize the correct upstream-vs-dotnet distinction), so the criterion should use a libdatadog-dotnet example and call out that upstream versions are not the pinned value.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T10:59:46Z

+# List more than one to validate the combined instruction set (e.g. AGENTS.md + a Claude skill file).
+instruction_files:
+  - AGENTS.md
+  - .claude/skills/bump-libdatadog/SKILL.md   # also gate the libdatadog-bump skill


Gate the skill's helper script too

When only .claude/skills/bump-libdatadog/scripts/fetch-release-hashes.sh changes, the comments here say the CI trigger derives its watch paths from instruction_files, but the only watched skill path is SKILL.md. A broken checksum-fetching script would therefore bypass dotnet-tracer-bump-libdatadog-013 even though the skill tells agents to run it, so include the whole skill directory or the script path in the watched inputs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T10:59:46Z

+  # footprint in this repo is the local .llm-validation/ suite. Pinned to the platform repo's `main`, which
+  # carries ci/ + the CLI on the gitlab.ddbuild.io mirror. See that repo's ci/README.md for configuration.
+  - project: 'DataDog/llm-validation-platform'
+    ref: main


Pin the external CI template to an immutable ref

For every pipeline after this merge, this include follows the platform repo's moving main branch, so a later template change can alter or break dd-trace-dotnet's merge gate without any change in this repo. Since this job can block PRs, use a tag or commit SHA and bump it deliberately instead of relying on a mutable branch.

Useful? React with 👍 / 👎.

Basic infra

004d2ec

NachoEchevarria added the area:builds project files, build scripts, pipelines, versioning, releases, packages label Jun 29, 2026

NachoEchevarria and others added 3 commits June 29, 2026 15:19

Merge branch 'master' into nacho/LLMPlatformJob

17a6a64

test a small change

f5d978c

Update dotnet-tracer-agent-v0.1.yaml

ab5c51c

NachoEchevarria and others added 5 commits June 30, 2026 11:21

Merge branch 'master' into nacho/LLMPlatformJob

ea6b6ad

Update config.yaml

96aae49

Restore default values

22f77a0

Merge branch 'master' into nacho/LLMPlatformJob

d7ed686

Update config.yaml

6553036

NachoEchevarria changed the title ~~Basic infra~~ Add the LLM Validation gate for AGENTS.md + the libdatadog-bump skill Jul 2, 2026

NachoEchevarria and others added 3 commits July 2, 2026 12:58

Cleanup

c6a6863

Update dotnet-tracer-agent-v0.1.yaml

6d1e987

Merge branch 'master' into nacho/LLMPlatformJob

3d3520a

NachoEchevarria marked this pull request as ready for review July 3, 2026 10:52

NachoEchevarria requested a review from a team as a code owner July 3, 2026 10:52

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

updates

94fd792

Uh oh!

Conversation

NachoEchevarria commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Footprint in this repo

How it behaves

The AGENTS.md change

Uh oh!

pr-commenter Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

scenario:Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery net472

Known flaky benchmarks

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net472

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net472

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net6.0

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0

Uh oh!

dd-trace-dotnet-ci-bot Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Execution-Time Benchmarks Report ⏱️

FakeDbCommand

NachoEchevarria commented Jun 29, 2026 •

edited

Loading

pr-commenter Bot commented Jun 29, 2026 •

edited

Loading

dd-trace-dotnet-ci-bot Bot commented Jun 29, 2026 •

edited

Loading

pr-commenter Bot commented Jun 29, 2026 •

edited

Loading