Skip to content

Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics #810

Open
jbachorik wants to merge 6 commits intodevelopfrom
claude/btrace-ai-extensions-920Ie
Open

Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics #810
jbachorik wants to merge 6 commits intodevelopfrom
claude/btrace-ai-extensions-920Ie

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented Mar 28, 2026

Summary

Adds five new modules that bring BTrace into the AI/ML observability space:

  • btrace-mcp-server — Model Context Protocol server exposing BTrace as 7 MCP tools (list_jvms, deploy_oneliner, deploy_script, list_probes, send_event, detach_probe, exit_probe) plus 3 diagnostic prompt templates. Enables LLMs like Claude to attach to JVMs and run BTrace probes via natural language.

  • btrace-llm-trace — LLM inference observability: token counts (input/output/cache-read/cache-creation), latency (min/avg/max), cost estimation with cache savings, streaming TTFT. Covers Langchain4j, Spring AI, OpenAI Java SDK, Anthropic Java SDK patterns.

  • btrace-rag-quality — RAG pipeline observability: vector DB query latency, similarity score tracking (top/low/spread), empty retrieval rates, chunk token budgets, end-to-end pipeline timing (retrieval vs generation). Targets Pinecone, Milvus, Weaviate, Chroma, pgvector, Qdrant.

  • btrace-vibe-guard — Runtime behavioral contracts for AI-generated code: latency budgets, call rate limits (sliding-window), range checks, null-safety, boolean assertions, and AI-vs-human code path performance comparison.

  • btrace-gpu-bridge — GPU compute and model inference tracing: ONNX Runtime, DJL, TensorFlow Java session tracking, batch throughput, device memory allocation/peak tracking, native FFM call profiling (cuBLAS, cuDNN).

Design principles

All four extensions follow the same architecture:

  • Zero allocation on hot paths — fluent builders (llm.call("model").inputTokens(1500).record()) use ThreadLocal pooling, reusing one instance per thread instead of allocating per call
  • Lock-free concurrency — all statistics use AtomicLong with CAS loops for min/max, no locks anywhere
  • Zero external dependencies — instrument existing client library classes, no additional JARs required
  • Standard BTrace extension pattern@ServiceDescriptor interfaces in src/api/, isolated implementations in src/impl/, auto-discovered via the Gradle extension plugin

Files

Area Files Lines
MCP server 12 source files ~2,000
LLM trace extension 5 source + 1 test ~1,100
RAG quality extension 5 source + 1 test ~800
Vibe guard extension 3 source + 1 test ~700
GPU bridge extension 5 source + 1 test ~800
Sample BTrace scripts 4 scripts ~335
Total 41 files ~6,200

Test plan

  • Unit tests pass for btrace-llm-trace (20+ tests: duration-only, fluent builder, cache tokens, embeddings, concurrency)
  • Unit tests pass for btrace-rag-quality (18 tests: queries, similarity scores, pipelines, empty retrievals, concurrency)
  • Unit tests pass for btrace-vibe-guard (18 tests: latency budgets, rate limits, assertions, range checks, AI vs human paths, concurrency)
  • Unit tests pass for btrace-gpu-bridge (18 tests: inference, memory alloc/free, native calls, model load, formatters, concurrency)
  • MCP server compiles and handles JSON-RPC initialize/tools/list/tools/call lifecycle
  • Sample BTrace scripts compile against extension APIs
  • Existing BTrace tests unaffected (no changes to core beyond 3 visibility modifiers in Client.java)

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB


This change is Reviewable

claude added 6 commits March 28, 2026 15:12
Implements an MCP (Model Context Protocol) server that exposes BTrace
operations as tools over stdio JSON-RPC transport. This allows LLM clients
(Claude Desktop, Claude Code, Cursor, etc.) to attach to running JVMs and
deploy diagnostic probes conversationally.

MCP tools: list_jvms, deploy_oneliner, deploy_script, list_probes,
send_event, detach_probe, exit_probe.

MCP prompts: diagnose_slow_endpoint, find_exception_source, profile_method.

Also makes Client.connectAndListProbes, isDisconnected, and disconnect
public for cross-module access.

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
Use Java 11 source/target (MCP server needs 11+ APIs) while keeping
the project's standard JDK 11 toolchain from common.gradle. Restore
tools.jar compileOnly dependency for sun.jvmstat access.

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
Documents MCP tools, prompts, build instructions, and configuration
for Claude Desktop, Claude Code, and Cursor integration.

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
New BTrace extension that provides an LlmTraceService for recording and
aggregating LLM API call metrics from BTrace scripts:

- Per-model token counts (input/output), latency (min/avg/max)
- Streaming call tracking with time-to-first-token
- Tool/function call counting
- Error tracking by type
- Built-in cost estimation for Claude, GPT, Gemini model families
- Thread-safe lock-free implementation using AtomicLong counters

Extension structure follows btrace-metrics/btrace-utils pattern:
- API: LlmTraceService interface with @ServiceDescriptor
- Impl: LlmTraceServiceImpl extending Extension, zero external deps
- Tests: 14 tests covering aggregation, cost estimation, concurrency

Includes sample BTrace script (LlmTrace.java) demonstrating usage with
Langchain4j ChatLanguageModel instrumentation.

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
The fluent builder (llm.call("model").inputTokens(...).record()) now
reuses a per-thread CallRecordImpl instance instead of allocating a new
object on every call. This eliminates GC pressure on hot-path
instrumentation while preserving the ergonomic fluent API.

Also includes earlier API refinements: simplified recording methods,
cache token tracking, embedding support, and comprehensive tests.

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
…nsions

Three new AI/computing extensions following the BTrace extension pattern:

- btrace-rag-quality: RAG pipeline observability — vector DB query latency,
  similarity scores, empty retrieval rates, chunk quality. Supports Pinecone,
  Milvus, Weaviate, Chroma, pgvector, Qdrant.

- btrace-vibe-guard: Runtime behavioral contracts for AI-generated code —
  latency budgets, call rate limits, range checks, null-safety enforcement,
  AI vs human code path comparison.

- btrace-gpu-bridge: GPU compute and model inference tracing — ONNX Runtime,
  DJL, TensorFlow Java. Tracks inference latency, batch throughput, device
  memory allocation, native FFM calls to CUDA/cuBLAS.

All three use ThreadLocal-pooled fluent builders (zero allocation),
lock-free AtomicLong statistics, and include comprehensive tests.
Sample BTrace scripts included for each.

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
@jbachorik jbachorik changed the title Add btrace-mcp-server module for LLM-driven JVM diagnostics Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants