Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics #810
Open
Add AI/ML extensions and MCP server for LLM-driven JVM diagnostics #810
Conversation
Implements an MCP (Model Context Protocol) server that exposes BTrace operations as tools over stdio JSON-RPC transport. This allows LLM clients (Claude Desktop, Claude Code, Cursor, etc.) to attach to running JVMs and deploy diagnostic probes conversationally. MCP tools: list_jvms, deploy_oneliner, deploy_script, list_probes, send_event, detach_probe, exit_probe. MCP prompts: diagnose_slow_endpoint, find_exception_source, profile_method. Also makes Client.connectAndListProbes, isDisconnected, and disconnect public for cross-module access. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
Use Java 11 source/target (MCP server needs 11+ APIs) while keeping the project's standard JDK 11 toolchain from common.gradle. Restore tools.jar compileOnly dependency for sun.jvmstat access. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
Documents MCP tools, prompts, build instructions, and configuration for Claude Desktop, Claude Code, and Cursor integration. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
New BTrace extension that provides an LlmTraceService for recording and aggregating LLM API call metrics from BTrace scripts: - Per-model token counts (input/output), latency (min/avg/max) - Streaming call tracking with time-to-first-token - Tool/function call counting - Error tracking by type - Built-in cost estimation for Claude, GPT, Gemini model families - Thread-safe lock-free implementation using AtomicLong counters Extension structure follows btrace-metrics/btrace-utils pattern: - API: LlmTraceService interface with @ServiceDescriptor - Impl: LlmTraceServiceImpl extending Extension, zero external deps - Tests: 14 tests covering aggregation, cost estimation, concurrency Includes sample BTrace script (LlmTrace.java) demonstrating usage with Langchain4j ChatLanguageModel instrumentation. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
The fluent builder (llm.call("model").inputTokens(...).record()) now
reuses a per-thread CallRecordImpl instance instead of allocating a new
object on every call. This eliminates GC pressure on hot-path
instrumentation while preserving the ergonomic fluent API.
Also includes earlier API refinements: simplified recording methods,
cache token tracking, embedding support, and comprehensive tests.
https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
…nsions Three new AI/computing extensions following the BTrace extension pattern: - btrace-rag-quality: RAG pipeline observability — vector DB query latency, similarity scores, empty retrieval rates, chunk quality. Supports Pinecone, Milvus, Weaviate, Chroma, pgvector, Qdrant. - btrace-vibe-guard: Runtime behavioral contracts for AI-generated code — latency budgets, call rate limits, range checks, null-safety enforcement, AI vs human code path comparison. - btrace-gpu-bridge: GPU compute and model inference tracing — ONNX Runtime, DJL, TensorFlow Java. Tracks inference latency, batch throughput, device memory allocation, native FFM calls to CUDA/cuBLAS. All three use ThreadLocal-pooled fluent builders (zero allocation), lock-free AtomicLong statistics, and include comprehensive tests. Sample BTrace scripts included for each. https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds five new modules that bring BTrace into the AI/ML observability space:
btrace-mcp-server — Model Context Protocol server exposing BTrace as 7 MCP tools (list_jvms, deploy_oneliner, deploy_script, list_probes, send_event, detach_probe, exit_probe) plus 3 diagnostic prompt templates. Enables LLMs like Claude to attach to JVMs and run BTrace probes via natural language.
btrace-llm-trace — LLM inference observability: token counts (input/output/cache-read/cache-creation), latency (min/avg/max), cost estimation with cache savings, streaming TTFT. Covers Langchain4j, Spring AI, OpenAI Java SDK, Anthropic Java SDK patterns.
btrace-rag-quality — RAG pipeline observability: vector DB query latency, similarity score tracking (top/low/spread), empty retrieval rates, chunk token budgets, end-to-end pipeline timing (retrieval vs generation). Targets Pinecone, Milvus, Weaviate, Chroma, pgvector, Qdrant.
btrace-vibe-guard — Runtime behavioral contracts for AI-generated code: latency budgets, call rate limits (sliding-window), range checks, null-safety, boolean assertions, and AI-vs-human code path performance comparison.
btrace-gpu-bridge — GPU compute and model inference tracing: ONNX Runtime, DJL, TensorFlow Java session tracking, batch throughput, device memory allocation/peak tracking, native FFM call profiling (cuBLAS, cuDNN).
Design principles
All four extensions follow the same architecture:
llm.call("model").inputTokens(1500).record()) useThreadLocalpooling, reusing one instance per thread instead of allocating per callAtomicLongwith CAS loops for min/max, no locks anywhere@ServiceDescriptorinterfaces insrc/api/, isolated implementations insrc/impl/, auto-discovered via the Gradle extension pluginFiles
Test plan
https://claude.ai/code/session_012KcpiFxvscxzWWgN5LiRcB
This change is