Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics by jbachorik · Pull Request #810 · btraceio/btrace

jbachorik · 2026-03-28T15:54:02Z

Summary

Adds five new modules that bring BTrace into the AI/ML observability space:

btrace-mcp-server — Model Context Protocol server exposing BTrace as 7 MCP tools (list_jvms, deploy_oneliner, deploy_script, list_probes, send_event, detach_probe, exit_probe) plus 3 diagnostic prompt templates. Enables LLMs like Claude to attach to JVMs and run BTrace probes via natural language.
btrace-llm-trace — LLM inference observability: token counts (input/output/cache-read/cache-creation), latency (min/avg/max), cost estimation with cache savings, streaming TTFT. Covers Langchain4j, Spring AI, OpenAI Java SDK, Anthropic Java SDK patterns.
btrace-rag-quality — RAG pipeline observability: vector DB query latency, similarity score tracking (top/low/spread), empty retrieval rates, chunk token budgets, end-to-end pipeline timing (retrieval vs generation). Targets Pinecone, Milvus, Weaviate, Chroma, pgvector, Qdrant.
btrace-vibe-guard — Runtime behavioral contracts for AI-generated code: latency budgets, call rate limits (sliding-window), range checks, null-safety, boolean assertions, and AI-vs-human code path performance comparison.
btrace-gpu-bridge — GPU compute and model inference tracing: ONNX Runtime, DJL, TensorFlow Java session tracking, batch throughput, device memory allocation/peak tracking, native FFM call profiling (cuBLAS, cuDNN).

Design principles

All four extensions follow the same architecture:

Zero allocation on hot paths — fluent builders (llm.call("model").inputTokens(1500).record()) use ThreadLocal pooling, reusing one instance per thread instead of allocating per call
Lock-free concurrency — all statistics use AtomicLong with CAS loops for min/max, no locks anywhere
Zero external dependencies — instrument existing client library classes, no additional JARs required
Standard BTrace extension pattern — @ServiceDescriptor interfaces in src/api/, isolated implementations in src/impl/, auto-discovered via the Gradle extension plugin

Files

Area	Files	Lines
MCP server	12 source files	~2,000
LLM trace extension	5 source + 1 test	~1,100
RAG quality extension	5 source + 1 test	~800
Vibe guard extension	3 source + 1 test	~700
GPU bridge extension	5 source + 1 test	~800
Sample BTrace scripts	4 scripts	~335
Total	41 files	~6,200

Test plan

Unit tests pass for btrace-llm-trace (20+ tests: duration-only, fluent builder, cache tokens, embeddings, concurrency)
Unit tests pass for btrace-rag-quality (18 tests: queries, similarity scores, pipelines, empty retrievals, concurrency)
Unit tests pass for btrace-vibe-guard (18 tests: latency budgets, rate limits, assertions, range checks, AI vs human paths, concurrency)
Unit tests pass for btrace-gpu-bridge (18 tests: inference, memory alloc/free, native calls, model load, formatters, concurrency)
MCP server compiles and handles JSON-RPC initialize/tools/list/tools/call lifecycle
Sample BTrace scripts compile against extension APIs
Existing BTrace tests unaffected (no changes to core beyond 3 visibility modifiers in Client.java)

https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

This change is

Implements an MCP (Model Context Protocol) server that exposes BTrace operations as tools over stdio JSON-RPC transport. This allows LLM clients (Claude Desktop, Claude Code, Cursor, etc.) to attach to running JVMs and deploy diagnostic probes conversationally. MCP tools: list_jvms, deploy_oneliner, deploy_script, list_probes, send_event, detach_probe, exit_probe. MCP prompts: diagnose_slow_endpoint, find_exception_source, profile_method. Also makes Client.connectAndListProbes, isDisconnected, and disconnect public for cross-module access. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

Use Java 11 source/target (MCP server needs 11+ APIs) while keeping the project's standard JDK 11 toolchain from common.gradle. Restore tools.jar compileOnly dependency for sun.jvmstat access. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

Documents MCP tools, prompts, build instructions, and configuration for Claude Desktop, Claude Code, and Cursor integration. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

New BTrace extension that provides an LlmTraceService for recording and aggregating LLM API call metrics from BTrace scripts: - Per-model token counts (input/output), latency (min/avg/max) - Streaming call tracking with time-to-first-token - Tool/function call counting - Error tracking by type - Built-in cost estimation for Claude, GPT, Gemini model families - Thread-safe lock-free implementation using AtomicLong counters Extension structure follows btrace-metrics/btrace-utils pattern: - API: LlmTraceService interface with @ServiceDescriptor - Impl: LlmTraceServiceImpl extending Extension, zero external deps - Tests: 14 tests covering aggregation, cost estimation, concurrency Includes sample BTrace script (LlmTrace.java) demonstrating usage with Langchain4j ChatLanguageModel instrumentation. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

The fluent builder (llm.call("model").inputTokens(...).record()) now reuses a per-thread CallRecordImpl instance instead of allocating a new object on every call. This eliminates GC pressure on hot-path instrumentation while preserving the ergonomic fluent API. Also includes earlier API refinements: simplified recording methods, cache token tracking, embedding support, and comprehensive tests. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

…nsions Three new AI/computing extensions following the BTrace extension pattern: - btrace-rag-quality: RAG pipeline observability — vector DB query latency, similarity scores, empty retrieval rates, chunk quality. Supports Pinecone, Milvus, Weaviate, Chroma, pgvector, Qdrant. - btrace-vibe-guard: Runtime behavioral contracts for AI-generated code — latency budgets, call rate limits, range checks, null-safety enforcement, AI vs human code path comparison. - btrace-gpu-bridge: GPU compute and model inference tracing — ONNX Runtime, DJL, TensorFlow Java. Tracks inference latency, batch throughput, device memory allocation, native FFM calls to CUDA/cuBLAS. All three use ThreadLocal-pooled fluent builders (zero allocation), lock-free AtomicLong statistics, and include comprehensive tests. Sample BTrace scripts included for each. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

claude added 6 commits March 28, 2026 15:12

Add README for btrace-mcp-server module

135fcb1

Documents MCP tools, prompts, build instructions, and configuration for Claude Desktop, Claude Code, and Cursor integration. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB

jbachorik changed the title ~~Add btrace-mcp-server module for LLM-driven JVM diagnostics~~ Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics #810

Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics #810
jbachorik wants to merge 6 commits intodevelopfrom
claude/btrace-ai-extensions-920Ie

jbachorik commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jbachorik commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design principles

Files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jbachorik commented Mar 28, 2026 •

edited

Loading