[EVAL] AI-generated Gson 1.6 instrumentation (blind test) by jordan-wong · Pull Request #10940 · DataDog/dd-trace-java

jordan-wong · 2026-03-23T16:54:41Z

Summary

AI-generated instrumentation for Gson 1.6 using the apm-instrumentation-toolkit. This is a blind test evaluation - the original implementation was deleted before generation to ensure zero contamination.

🎯 Evaluation Context

Purpose: Evaluate AI code generation quality without reference to existing implementation
Method: Shallow clone + dynamic config override (complete isolation)
Contamination: ✅ ZERO - verified via agent log analysis

📊 Generation Metrics

Metric	Value
Runtime	425.3s (7.1 minutes)
Agent turns	96
Cost	$3.29

✅ Layer 1 Validation (Automated)

All checks passed:

✅ compileJava
✅ spotlessCheck
✅ codenarcTest
✅ muzzle
✅ test
✅ latestDepTest

💡 Key Innovations

NEW: GsonHelper abstraction - Clean pattern for CallDepthThreadLocalMap
Broader method matchers - Catches all toJson/fromJson overloads
Consistent naming - methodEnter/methodExit throughout
Cleaner structure - Better code organization

📉 Known Regressions vs Original

⚠️ Missing span metadata - No source/target type tags (HIGH severity)
⚠️ No ClassLoader matcher - Missing version safety check (MEDIUM severity)
⚠️ Simplified tests - 40% fewer test cases (LOW severity)

📚 Comprehensive Analysis

See eval-comparison/ directory in apm-instrumentation-toolkit for detailed evaluation.

🎓 Evaluation Outcome

Overall Score: Generated: 7.8/10 | Original: 7.5/10

Recommendation: Adopt with modifications - restore span metadata and add ClassLoader matcher.

🤖 Generated with apm-instrumentation-toolkit | Run #4 (Blind Test)

…ate) Generated by apm-instrumentation-toolkit using java_integration workflow. This is a BLIND TEST run - gson was deleted from repo before generation. Agent had ZERO access to original implementation (shallow clone + config override). **Generation Metrics:** - Runtime: 425.3s (7.1 minutes) - Agent turns: 96 - Cost: $3.29 **Layer 1 Validation:** ✅ ALL PASS - compileJava: ✅ PASS - spotlessCheck: ✅ PASS - codenarcTest: ✅ PASS - muzzle: ✅ PASS - test: ✅ PASS - latestDepTest: ✅ PASS **Key Innovations:** - NEW: GsonHelper abstraction class for CallDepthThreadLocalMap - Broader method matchers (catches all toJson/fromJson overloads) - Cleaner code structure with consistent naming **Contamination Check:** ✅ ZERO - Verified agent logs show no git show commands - All file paths show /tmp/dd-trace-java-gson-clean/ - Agent used jackson-core and hystrix as references (both exist in clean clone) **Evaluation:** See eval-comparison/ directory for comprehensive analysis 🤖 Generated with apm-instrumentation-toolkit

pr-commenter · 2026-03-23T17:39:25Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	apm-ai-toolkit/java_integration/gson/20260323-115140
git_commit_date	1774050014	1774284786
git_commit_sha	`c00f676`	`668e513`
release_version	1.61.0-SNAPSHOT~c00f676bb9	1.61.0-SNAPSHOT~668e51355f

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1774286550	1774286550
ci_job_id	1531315758	1531315758
ci_pipeline_id	104030890	104030890
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-4gno13ks 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-4gno13ks 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 10 unstable metrics.

scenario	Δ mean execution_time	candidate mean execution_time	baseline mean execution_time
scenario:startup:petclinic:iast:Remote Config	better [-32.301µs; -13.265µs] or [-6.003%; -2.465%]	515.251µs	538.034µs

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053300
Total [baseline] (10.928 s) : 0, 10927723
Agent [candidate] (1.058 s) : 0, 1057761
Total [candidate] (11.007 s) : 0, 11007414
section appsec
Agent [baseline] (1.246 s) : 0, 1245997
Total [baseline] (11.12 s) : 0, 11119572
Agent [candidate] (1.256 s) : 0, 1256060
Total [candidate] (11.259 s) : 0, 11258982
section iast
Agent [baseline] (1.23 s) : 0, 1229703
Total [baseline] (11.262 s) : 0, 11262227
Agent [candidate] (1.234 s) : 0, 1233566
Total [candidate] (11.376 s) : 0, 11376452
section profiling
Agent [baseline] (1.183 s) : 0, 1182876
Total [baseline] (10.963 s) : 0, 10962994
Agent [candidate] (1.199 s) : 0, 1199409
Total [candidate] (11.055 s) : 0, 11054585

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.053 s	-
Agent	appsec	1.246 s	192.697 ms (18.3%)
Agent	iast	1.23 s	176.403 ms (16.7%)
Agent	profiling	1.183 s	129.576 ms (12.3%)
Total	tracing	10.928 s	-
Total	appsec	11.12 s	191.849 ms (1.8%)
Total	iast	11.262 s	334.504 ms (3.1%)
Total	profiling	10.963 s	35.271 ms (0.3%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	appsec	1.256 s	198.299 ms (18.7%)
Agent	iast	1.234 s	175.805 ms (16.6%)
Agent	profiling	1.199 s	141.647 ms (13.4%)
Total	tracing	11.007 s	-
Total	appsec	11.259 s	251.568 ms (2.3%)
Total	iast	11.376 s	369.038 ms (3.4%)
Total	profiling	11.055 s	47.171 ms (0.4%)

gantt
    title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (626.936 ms) : 0, 626936
BytebuddyAgent [candidate] (629.134 ms) : 0, 629134
AgentMeter [baseline] (29.243 ms) : 0, 29243
AgentMeter [candidate] (29.358 ms) : 0, 29358
GlobalTracer [baseline] (255.94 ms) : 0, 255940
GlobalTracer [candidate] (257.109 ms) : 0, 257109
AppSec [baseline] (31.598 ms) : 0, 31598
AppSec [candidate] (31.768 ms) : 0, 31768
Debugger [baseline] (60.43 ms) : 0, 60430
Debugger [candidate] (60.33 ms) : 0, 60330
Remote Config [baseline] (590.817 µs) : 0, 591
Remote Config [candidate] (590.862 µs) : 0, 591
Telemetry [baseline] (7.989 ms) : 0, 7989
Telemetry [candidate] (8.068 ms) : 0, 8068
Flare Poller [baseline] (3.56 ms) : 0, 3560
Flare Poller [candidate] (4.307 ms) : 0, 4307
section appsec
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (658.283 ms) : 0, 658283
BytebuddyAgent [candidate] (661.683 ms) : 0, 661683
AgentMeter [baseline] (12.105 ms) : 0, 12105
AgentMeter [candidate] (12.304 ms) : 0, 12304
GlobalTracer [baseline] (257.853 ms) : 0, 257853
GlobalTracer [candidate] (260.959 ms) : 0, 260959
IAST [baseline] (24.142 ms) : 0, 24142
IAST [candidate] (24.657 ms) : 0, 24657
AppSec [baseline] (177.599 ms) : 0, 177599
AppSec [candidate] (179.484 ms) : 0, 179484
Debugger [baseline] (65.93 ms) : 0, 65930
Debugger [candidate] (66.779 ms) : 0, 66779
Remote Config [baseline] (631.667 µs) : 0, 632
Remote Config [candidate] (624.83 µs) : 0, 625
Telemetry [baseline] (8.365 ms) : 0, 8365
Telemetry [candidate] (8.416 ms) : 0, 8416
Flare Poller [baseline] (3.623 ms) : 0, 3623
Flare Poller [candidate] (3.657 ms) : 0, 3657
section iast
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (796.847 ms) : 0, 796847
BytebuddyAgent [candidate] (800.204 ms) : 0, 800204
AgentMeter [baseline] (11.422 ms) : 0, 11422
AgentMeter [candidate] (11.606 ms) : 0, 11606
GlobalTracer [baseline] (247.635 ms) : 0, 247635
GlobalTracer [candidate] (247.98 ms) : 0, 247980
IAST [baseline] (25.431 ms) : 0, 25431
IAST [candidate] (25.433 ms) : 0, 25433
AppSec [baseline] (26.665 ms) : 0, 26665
AppSec [candidate] (26.683 ms) : 0, 26683
Debugger [baseline] (70.52 ms) : 0, 70520
Debugger [candidate] (68.561 ms) : 0, 68561
Remote Config [baseline] (538.034 µs) : 0, 538
Remote Config [candidate] (515.251 µs) : 0, 515
Telemetry [baseline] (9.825 ms) : 0, 9825
Telemetry [candidate] (11.276 ms) : 0, 11276
Flare Poller [baseline] (3.479 ms) : 0, 3479
Flare Poller [candidate] (3.949 ms) : 0, 3949
section profiling
crashtracking [baseline] (1.17 ms) : 0, 1170
crashtracking [candidate] (1.188 ms) : 0, 1188
BytebuddyAgent [baseline] (682.794 ms) : 0, 682794
BytebuddyAgent [candidate] (692.943 ms) : 0, 692943
AgentMeter [baseline] (8.986 ms) : 0, 8986
AgentMeter [candidate] (9.102 ms) : 0, 9102
GlobalTracer [baseline] (215.459 ms) : 0, 215459
GlobalTracer [candidate] (218.223 ms) : 0, 218223
AppSec [baseline] (32.086 ms) : 0, 32086
AppSec [candidate] (32.703 ms) : 0, 32703
Debugger [baseline] (64.47 ms) : 0, 64470
Debugger [candidate] (66.623 ms) : 0, 66623
Remote Config [baseline] (564.797 µs) : 0, 565
Remote Config [candidate] (586.107 µs) : 0, 586
Telemetry [baseline] (8.48 ms) : 0, 8480
Telemetry [candidate] (7.828 ms) : 0, 7828
Flare Poller [baseline] (4.21 ms) : 0, 4210
Flare Poller [candidate] (3.551 ms) : 0, 3551
ProfilingAgent [baseline] (93.724 ms) : 0, 93724
ProfilingAgent [candidate] (94.876 ms) : 0, 94876
Profiling [baseline] (94.285 ms) : 0, 94285
Profiling [candidate] (95.442 ms) : 0, 95442

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.061 s) : 0, 1061263
Total [baseline] (8.836 s) : 0, 8835661
Agent [candidate] (1.058 s) : 0, 1058319
Total [candidate] (8.838 s) : 0, 8837746
section iast
Agent [baseline] (1.222 s) : 0, 1222093
Total [baseline] (9.527 s) : 0, 9527343
Agent [candidate] (1.226 s) : 0, 1225838
Total [candidate] (9.539 s) : 0, 9539038

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.061 s	-
Agent	iast	1.222 s	160.83 ms (15.2%)
Total	tracing	8.836 s	-
Total	iast	9.527 s	691.683 ms (7.8%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	iast	1.226 s	167.519 ms (15.8%)
Total	tracing	8.838 s	-
Total	iast	9.539 s	701.293 ms (7.9%)

gantt
    title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.23 ms) : 0, 1230
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (632.895 ms) : 0, 632895
BytebuddyAgent [candidate] (629.962 ms) : 0, 629962
AgentMeter [baseline] (29.574 ms) : 0, 29574
AgentMeter [candidate] (29.349 ms) : 0, 29349
GlobalTracer [baseline] (257.364 ms) : 0, 257364
GlobalTracer [candidate] (257.18 ms) : 0, 257180
AppSec [baseline] (31.632 ms) : 0, 31632
AppSec [candidate] (31.756 ms) : 0, 31756
Debugger [baseline] (59.611 ms) : 0, 59611
Debugger [candidate] (59.599 ms) : 0, 59599
Remote Config [baseline] (585.298 µs) : 0, 585
Remote Config [candidate] (592.191 µs) : 0, 592
Telemetry [baseline] (8.034 ms) : 0, 8034
Telemetry [candidate] (8.163 ms) : 0, 8163
Flare Poller [baseline] (4.249 ms) : 0, 4249
Flare Poller [candidate] (4.36 ms) : 0, 4360
section iast
crashtracking [baseline] (1.213 ms) : 0, 1213
crashtracking [candidate] (1.233 ms) : 0, 1233
BytebuddyAgent [baseline] (792.974 ms) : 0, 792974
BytebuddyAgent [candidate] (795.263 ms) : 0, 795263
AgentMeter [baseline] (11.383 ms) : 0, 11383
AgentMeter [candidate] (11.358 ms) : 0, 11358
GlobalTracer [baseline] (245.929 ms) : 0, 245929
GlobalTracer [candidate] (247.186 ms) : 0, 247186
IAST [baseline] (25.28 ms) : 0, 25280
IAST [candidate] (25.379 ms) : 0, 25379
AppSec [baseline] (26.429 ms) : 0, 26429
AppSec [candidate] (26.508 ms) : 0, 26508
Debugger [baseline] (67.166 ms) : 0, 67166
Debugger [candidate] (67.077 ms) : 0, 67077
Remote Config [baseline] (523.851 µs) : 0, 524
Remote Config [candidate] (529.501 µs) : 0, 530
Telemetry [baseline] (11.175 ms) : 0, 11175
Telemetry [candidate] (11.249 ms) : 0, 11249
Flare Poller [baseline] (3.994 ms) : 0, 3994
Flare Poller [candidate] (3.958 ms) : 0, 3958

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	apm-ai-toolkit/java_integration/gson/20260323-115140
git_commit_date	1774050014	1774284786
git_commit_sha	`c00f676`	`668e513`
release_version	1.61.0-SNAPSHOT~c00f676bb9	1.61.0-SNAPSHOT~668e51355f

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1774287031	1774287031
ci_job_id	1531315760	1531315760
ci_pipeline_id	104030890	104030890
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-4jg7t0xh 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-4jg7t0xh 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 4 performance improvements and 1 performance regressions! Performance is the same for 16 metrics, 15 unstable metrics.

scenario	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p95	Δ mean throughput	candidate mean agg_http_req_duration_p50	candidate mean agg_http_req_duration_p95	candidate mean throughput	baseline mean agg_http_req_duration_p50	baseline mean agg_http_req_duration_p95	baseline mean throughput
scenario:load:insecure-bank:profiling:high_load	better [-256.257µs; -122.447µs] or [-14.126%; -6.750%]	unstable [-1422.000µs; -550.633µs] or [-25.652%; -9.933%]	unstable [+68.813op/s; +576.750op/s] or [+3.522%; +29.523%]	1.625ms	4.557ms	2276.344op/s	1.814ms	5.543ms	1953.562op/s
scenario:load:insecure-bank:iast_GLOBAL:high_load	better [-281.158µs; -124.481µs] or [-9.290%; -4.113%]	better [-619.421µs; -225.564µs] or [-7.440%; -2.709%]	unstable [-69.745op/s; +201.557op/s] or [-5.747%; +16.608%]	2.824ms	7.903ms	1279.500op/s	3.027ms	8.326ms	1213.594op/s
scenario:load:insecure-bank:iast_FULL:high_load	better [-402.053µs; -119.573µs] or [-7.356%; -2.188%]	same [-543.990µs; +147.851µs] or [-4.242%; +1.153%]	unstable [-50.833op/s; +110.271op/s] or [-6.701%; +14.536%]	5.205ms	12.626ms	788.344op/s	5.465ms	12.824ms	758.625op/s
scenario:load:petclinic:appsec:high_load	worse [+0.811ms; +1.875ms] or [+4.409%; +10.200%]	unsure [+0.450ms; +2.200ms] or [+1.478%; +7.227%]	unstable [-35.281op/s; +10.593op/s] or [-14.300%; +4.294%]	19.729ms	31.769ms	234.375op/s	18.386ms	30.444ms	246.719op/s

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
    dateFormat X
    axisFormat %s
section baseline
no_agent (19.136 ms) : 18944, 19328
.   : milestone, 19136,
appsec (18.913 ms) : 18721, 19105
.   : milestone, 18913,
code_origins (17.642 ms) : 17468, 17815
.   : milestone, 17642,
iast (17.807 ms) : 17630, 17983
.   : milestone, 17807,
profiling (18.568 ms) : 18383, 18754
.   : milestone, 18568,
tracing (17.73 ms) : 17554, 17905
.   : milestone, 17730,
section candidate
no_agent (18.065 ms) : 17880, 18250
.   : milestone, 18065,
appsec (19.922 ms) : 19715, 20129
.   : milestone, 19922,
code_origins (17.657 ms) : 17483, 17831
.   : milestone, 17657,
iast (18.068 ms) : 17888, 18248
.   : milestone, 18068,
profiling (18.512 ms) : 18331, 18694
.   : milestone, 18512,
tracing (17.532 ms) : 17356, 17708
.   : milestone, 17532,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	19.136 ms [18.944 ms, 19.328 ms]	-
appsec	18.913 ms [18.721 ms, 19.105 ms]	-223.052 µs (-1.2%)
code_origins	17.642 ms [17.468 ms, 17.815 ms]	-1.494 ms (-7.8%)
iast	17.807 ms [17.63 ms, 17.983 ms]	-1.329 ms (-6.9%)
profiling	18.568 ms [18.383 ms, 18.754 ms]	-567.544 µs (-3.0%)
tracing	17.73 ms [17.554 ms, 17.905 ms]	-1.406 ms (-7.3%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	18.065 ms [17.88 ms, 18.25 ms]	-
appsec	19.922 ms [19.715 ms, 20.129 ms]	1.857 ms (10.3%)
code_origins	17.657 ms [17.483 ms, 17.831 ms]	-407.898 µs (-2.3%)
iast	18.068 ms [17.888 ms, 18.248 ms]	3.237 µs (0.0%)
profiling	18.512 ms [18.331 ms, 18.694 ms]	447.312 µs (2.5%)
tracing	17.532 ms [17.356 ms, 17.708 ms]	-532.608 µs (-2.9%)

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.182 ms) : 1170, 1194
.   : milestone, 1182,
iast (3.121 ms) : 3080, 3162
.   : milestone, 3121,
iast_FULL (6.096 ms) : 6033, 6159
.   : milestone, 6096,
iast_GLOBAL (3.782 ms) : 3719, 3846
.   : milestone, 3782,
profiling (2.321 ms) : 2297, 2345
.   : milestone, 2321,
tracing (1.774 ms) : 1760, 1789
.   : milestone, 1774,
section candidate
no_agent (1.17 ms) : 1159, 1181
.   : milestone, 1170,
iast (3.207 ms) : 3164, 3249
.   : milestone, 3207,
iast_FULL (5.867 ms) : 5808, 5927
.   : milestone, 5867,
iast_GLOBAL (3.585 ms) : 3526, 3644
.   : milestone, 3585,
profiling (1.981 ms) : 1964, 1999
.   : milestone, 1981,
tracing (1.788 ms) : 1774, 1803
.   : milestone, 1788,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.182 ms [1.17 ms, 1.194 ms]	-
iast	3.121 ms [3.08 ms, 3.162 ms]	1.939 ms (164.0%)
iast_FULL	6.096 ms [6.033 ms, 6.159 ms]	4.914 ms (415.6%)
iast_GLOBAL	3.782 ms [3.719 ms, 3.846 ms]	2.6 ms (219.9%)
profiling	2.321 ms [2.297 ms, 2.345 ms]	1.139 ms (96.3%)
tracing	1.774 ms [1.76 ms, 1.789 ms]	592.218 µs (50.1%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.17 ms [1.159 ms, 1.181 ms]	-
iast	3.207 ms [3.164 ms, 3.249 ms]	2.036 ms (174.0%)
iast_FULL	5.867 ms [5.808 ms, 5.927 ms]	4.697 ms (401.4%)
iast_GLOBAL	3.585 ms [3.526 ms, 3.644 ms]	2.415 ms (206.4%)
profiling	1.981 ms [1.964 ms, 1.999 ms]	811.082 µs (69.3%)
tracing	1.788 ms [1.774 ms, 1.803 ms]	618.219 µs (52.8%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	apm-ai-toolkit/java_integration/gson/20260323-115140
git_commit_date	1774050014	1774284786
git_commit_sha	`c00f676`	`668e513`
release_version	1.61.0-SNAPSHOT~c00f676bb9	1.61.0-SNAPSHOT~668e51355f

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1774286935	1774286935
ci_job_id	1531315761	1531315761
ci_pipeline_id	104030890	104030890
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-j2ddva3c 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-j2ddva3c 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.847 s) : 14847000, 14847000
.   : milestone, 14847000,
appsec (14.814 s) : 14814000, 14814000
.   : milestone, 14814000,
iast (18.905 s) : 18905000, 18905000
.   : milestone, 18905000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
.   : milestone, 17785000,
profiling (15.011 s) : 15011000, 15011000
.   : milestone, 15011000,
tracing (14.98 s) : 14980000, 14980000
.   : milestone, 14980000,
section candidate
no_agent (15.516 s) : 15516000, 15516000
.   : milestone, 15516000,
appsec (14.521 s) : 14521000, 14521000
.   : milestone, 14521000,
iast (17.835 s) : 17835000, 17835000
.   : milestone, 17835000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
.   : milestone, 17785000,
profiling (15.387 s) : 15387000, 15387000
.   : milestone, 15387000,
tracing (14.812 s) : 14812000, 14812000
.   : milestone, 14812000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.847 s [14.847 s, 14.847 s]	-
appsec	14.814 s [14.814 s, 14.814 s]	-33.0 ms (-0.2%)
iast	18.905 s [18.905 s, 18.905 s]	4.058 s (27.3%)
iast_GLOBAL	17.785 s [17.785 s, 17.785 s]	2.938 s (19.8%)
profiling	15.011 s [15.011 s, 15.011 s]	164.0 ms (1.1%)
tracing	14.98 s [14.98 s, 14.98 s]	133.0 ms (0.9%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.516 s [15.516 s, 15.516 s]	-
appsec	14.521 s [14.521 s, 14.521 s]	-995.0 ms (-6.4%)
iast	17.835 s [17.835 s, 17.835 s]	2.319 s (14.9%)
iast_GLOBAL	17.785 s [17.785 s, 17.785 s]	2.269 s (14.6%)
profiling	15.387 s [15.387 s, 15.387 s]	-129.0 ms (-0.8%)
tracing	14.812 s [14.812 s, 14.812 s]	-704.0 ms (-4.5%)

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.482 ms) : 1470, 1493
.   : milestone, 1482,
appsec (3.79 ms) : 3570, 4009
.   : milestone, 3790,
iast (2.261 ms) : 2192, 2330
.   : milestone, 2261,
iast_GLOBAL (2.309 ms) : 2240, 2379
.   : milestone, 2309,
profiling (2.115 ms) : 2059, 2172
.   : milestone, 2115,
tracing (2.085 ms) : 2031, 2139
.   : milestone, 2085,
section candidate
no_agent (1.479 ms) : 1468, 1491
.   : milestone, 1479,
appsec (3.816 ms) : 3594, 4037
.   : milestone, 3816,
iast (2.267 ms) : 2198, 2335
.   : milestone, 2267,
iast_GLOBAL (2.312 ms) : 2242, 2381
.   : milestone, 2312,
profiling (2.093 ms) : 2038, 2147
.   : milestone, 2093,
tracing (2.08 ms) : 2027, 2134
.   : milestone, 2080,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.482 ms [1.47 ms, 1.493 ms]	-
appsec	3.79 ms [3.57 ms, 4.009 ms]	2.308 ms (155.7%)
iast	2.261 ms [2.192 ms, 2.33 ms]	778.982 µs (52.6%)
iast_GLOBAL	2.309 ms [2.24 ms, 2.379 ms]	827.511 µs (55.8%)
profiling	2.115 ms [2.059 ms, 2.172 ms]	633.389 µs (42.7%)
tracing	2.085 ms [2.031 ms, 2.139 ms]	602.856 µs (40.7%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.479 ms [1.468 ms, 1.491 ms]	-
appsec	3.816 ms [3.594 ms, 4.037 ms]	2.336 ms (157.9%)
iast	2.267 ms [2.198 ms, 2.335 ms]	787.058 µs (53.2%)
iast_GLOBAL	2.312 ms [2.242 ms, 2.381 ms]	832.395 µs (56.3%)
profiling	2.093 ms [2.038 ms, 2.147 ms]	613.211 µs (41.4%)
tracing	2.08 ms [2.027 ms, 2.134 ms]	600.901 µs (40.6%)

PerfectSlayer

Feedback from LP about generated instrumentation

PerfectSlayer · 2026-03-26T10:29:56Z

...umentation/gson/gson-1.6/src/main/java/datadog/trace/instrumentation/gson/GsonDecorator.java

+
+  @Override
+  protected String[] instrumentationNames() {
+    return new String[] {"gson"};


❔ question: ‏Should there be an alias with the version?

PerfectSlayer · 2026-03-26T10:32:25Z

...strumentation/gson/gson-1.6/src/main/java/datadog/trace/instrumentation/gson/GsonHelper.java

+
+import datadog.trace.bootstrap.CallDepthThreadLocalMap;
+
+public class GsonHelper {


❔ question: ‏What's the benefits of such helper? There is only one type instrumented, why not use it for the CallDepthThreadLocalMap calls?

PerfectSlayer · 2026-03-26T10:34:59Z

...rumentation/gson/gson-1.6/src/test/groovy/datadog/trace/instrumentation/gson/GsonTest.groovy

+import datadog.trace.agent.test.InstrumentationSpecification
+import datadog.trace.bootstrap.instrumentation.api.Tags
+
+class GsonTest extends InstrumentationSpecification {


#

🔨 issue: ‏It's missing error exception handling at least

jordan-wong assigned PerfectSlayer and jordan-wong Mar 25, 2026

PerfectSlayer added tag: do not merge Do not merge changes tag: experimental Experimental changes labels Mar 26, 2026

PerfectSlayer reviewed Mar 26, 2026

View reviewed changes

PerfectSlayer removed their assignment Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EVAL] AI-generated Gson 1.6 instrumentation (blind test)#10940

[EVAL] AI-generated Gson 1.6 instrumentation (blind test)#10940
jordan-wong wants to merge 1 commit intomasterfrom
apm-ai-toolkit/java_integration/gson/20260323-115140

jordan-wong commented Mar 23, 2026

Uh oh!

pr-commenter bot commented Mar 23, 2026

Uh oh!

PerfectSlayer left a comment

Uh oh!

PerfectSlayer Mar 26, 2026

Uh oh!

PerfectSlayer Mar 26, 2026

Uh oh!

PerfectSlayer Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		import datadog.trace.bootstrap.CallDepthThreadLocalMap;

		public class GsonHelper {

Conversation

jordan-wong commented Mar 23, 2026

Summary

🎯 Evaluation Context

📊 Generation Metrics

✅ Layer 1 Validation (Automated)

💡 Key Innovations

📉 Known Regressions vs Original

📚 Comprehensive Analysis

🎓 Evaluation Outcome

Uh oh!

pr-commenter bot commented Mar 23, 2026

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

PerfectSlayer left a comment

Choose a reason for hiding this comment

Uh oh!

PerfectSlayer Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

PerfectSlayer Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

PerfectSlayer Mar 26, 2026

Choose a reason for hiding this comment

#

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants